Skip to main content

Big Data and Analytics


 Big Data System (BDS) is a digital transformation from traditional structured data processing (RDBMS) to using Hadoop environment and Natural Language Processing (NLP) for unstructured data. Indeed, it facilitates better and faster decision making. Big Data has come like a tidal wave, which continues to move swiftly, across most industries/ organizations globally. Concurrently, a number of supporting technologies have also emerged to make it possible to store and quickly process vast data, flowing from anywhere to anywhere, at any time, in a variety of forms and formats.  BDS helps by quick extracting relevant information for better decision-making. Big Data is one of the most emerging technologies. It has great potential to impact our working culture, business processes/ practices and business strategies. 

Of late, social media has opened floodgates to data exchange to interact across the world on a 24x7 basis. There are no geographical boundaries or time zones or any fixed format or type of data.  Earlier, various business houses/enterprises have been dealing with large data but using their own Leased Lines and Data Centers. These configurations cannot cope with today’s data deluge, which includes structured (RDBM), Semi- Structured (value paring as in Mango DB) and Unstructured (text files, video clips, photographs, comments (likes), emails, posts on social media like Facebook, Instagram, Telegraph, WhatsApp’s, SMSs). Most of the social media data is unstructured and non-relational, requiring a high-tech workforce to pre-process and make it available to the end-user, in the required format. Fortunately, with easy and speedy communication connectivity available through the internet and future 5G mobile communication networks, one can organize BD using Cloud Computing. To meet such demands software tools like Hadoop, MapReduce, Pig, Hive, Ruby, Python are essential to deal with the Big Data environment.

Google, Oracle, IBM, Microsoft, Rack Space and Amazon are already using/offering their cloud services to handle BD. These IT world leaders provide relevant information to their customers for marketing, sales, CRM, SCM and/or political campaigns. It is estimated that India will have a 32% share of the Big Data global market by 2025. Despite initial apprehensions about data ownership, security and privacy, Cloud Computing and Big Data are the new growth engines for any business.

Rapid Growth of Big Data.  Due to the easy availability of Cloud Computing, Machine Learning (ML) and smarter Sensors, BD continues to grow very rapidly.  After initial apprehensions, a large number of industries/organizations are now adopting Big Data technology. Some main factors for the rapid growth of BD are:

·         Availability of Cloud Computing, with the elasticity of resources, assured security and integrity of data.

·         ML with improved algorithms, for information processing and decision making.

·         Advances in NLP for handing Multilingual text and voice messages.

·         Availability of, smarter and compact size data sensors and actuators.

·         IOT providing real / near-real-time data, from multiple locations.

·         Better network connectivity, through improved Internet services and the Arrival of 5 G mobile communication networks.

·         Demand by Health care services for better patient management, utilization of resources and research for disease control like Corona Pandemic

Characteristics of Big Data. A traditional database even if it is of a few terabytes size is not considered Big Data, since that can be handled using traditional RDBMS like Oracle, MySQL, IBM- DB2.  In large-size enterprises, the concept of data warehouses and data mining has been in vogue for over 15 years. These organizations have distributed computing environments using their own leased lines or private cloud using internet facilities. Big Data is a new approach to tackle data problems related to unstructured and multi-format data which are unsolvable using traditional tools of RDBMS There are four distinct characteristics of Big Data, popularly called (V4). These are briefly described below:

·         Volume: It is related to very high volumes of data in the range of Terabytes, and even Pico bytes. Indeed, data is very huge and it keeps growing continuously at a great speed and round the clock. (24x7) basis, throughout the year.

·         Variety: Data is organized in multiple structures, ranging from raw (unstructured) data, semi-structured data and structured (stored in rows and columns) data. To make things even more complex, data can be text, an email with and without attachment, SMS, audio clip, WhatsApp, Telegram, Tweet, Instagram, photo, video or sound clips and in any language/format.

·         Velocity: Data from registered customers, clients, business partners could be coming through normal channels is expected formats and data can come any time through leased data lines or the internet. However, data from social media networks, feedback from online trading and front-line press reporters could come any time. Data could also come simultaneously from many locations on the globe.

·         Veracity: It relates to the accuracy and trustworthiness of data, which forms the basis for timely and accurate decision making.

Sources of Big Data. There are basically three categories of data;

·         Internal  Data. The organization generates its own data and has full control over it. This data includes the corporate database, Internal documents (SOPs, policies, Instructions), in-house call centers data, website logs, data coming from sensors and controllers deployed at various locations.

·         External Data. It is public data or the data generated outside the organization. As such, the organization neither owns it nor controls it. This data includes- Social Media data like Facebook, Telegrams, Statistical data, Public Domain data and Machine Learning data.

·         Environment Data.  This relates to weather data, soil data, water sources data, road/ sea/ air data, healthcare data,

Need for Big Data.  Accurate and timely information is the key to success in any business. Earlier, the DBMS environment consisted of an RDBMS (Oracle, DB2, MS SQL, MySQL)  as back end, data schema and a web server. This traditional RDBMS approach is too complex to handle present-day Big Data. Today, we deal with thousands of Gigabytes of Data in various formats coming as   LinkedIn posts, Blog posts, Tweets, Facebook posts, Instagram, and Telegram messages.  Social Media network interactions in the form of text, video clips, photos are generating huge data traffic on a 24x7/365 basis.  Some data is structured and stored in a traditional RDBMS way, while other data, including documents related to customers, service records, and even pictures and videos, are stored in unstructured forms.  In addition, there are large data being generated by machines and sensors deployed inside the machinery, on the ground and in aerial vehicles like aircraft, drones and satellites. Other external information sources are human-generated on social media. Such diverse data is flowing across various industries, institutions, healthcare centers, Research and Development ( R&D) labs and  Government Departments.  Therefore, you have, to think about managing data differently and apply Big Data technology.

BD  Cloud Computing Environment. BDS is heavily dependent on Cloud Computing and IoT environments where data centre nodes continuously log messages and track various transactions. It is important to gather real / near real-time data and store it on multi-servers' clusters but still more important is  to extract hidden information and present comprehensive report to the customer/ user in a format he/she can easy comprehend and make better decisions. This huge volume of data requires parallel processing and a special approach to store data on multiple cluster computers (nodes). In addition, Big Data solution needs automatic scalability and recovery. To cope with the ever-growing data volume, the system should allocate more nodes, and the data will be redistributed among them automatically and seamlessly. All this is now possible by the easy availability of a cloud computing environment.

Big Data Architecture Big Data represents a “log of records” where each record describes some event, like a purchase in a store, customer visit to a retail store, a web-page viewed by an online buyer, a sensor fed data at a given moment, customer online feedback or a short message (like a comment) on a social network. Architecture like that of RDBMS can’t handle Big Data complexities, bug fixing and time-bound tedious operations. As data volume and complexity increase and customers expect faster response time, you need a different architecture for data storage, manipulation, and displaying timely and accurate information. Big Data systems should efficiently handle data volume, complexity and scalability aspects.

A number of Big Data architects were developed and deployed by Google, IBM and Microsoft.  Amazon who started a bit late had caught up fast and has created Dynamo, a new dis­tributed DBMS. The open-source community came up fast and gave a boost to Big Data by evolving new architecture and software tools like Hadoop, Hive, Pig, HBase, Apache Sqoop, and Mongo DB.  As of 2022, there is a number of architectures in use for Big Data. One simpler approach is adopting Lambda Architecture which has become popular as it avoids the complexities of traditional architecture and is easy to take off.

 Some Desirable features of Big Data architecture are:

•   Simple Design.  As we know, a complex system design is more likely to develop faults and harder to debug and maintain. To overcome complexity, the design of algorithms and modules should be simple.

•   Ad-hoc Queries. The database should be able to handle ad-hoc queries easily and efficiently.

•   Extensibility. It should be easy to incorporate changes when a customer makes a change in their business process/rules. Thus, the system should be able to accommodate additional functionality with minimal development effort and cost.

•   Quick response. . In traditional systems, Software Development Agency (SDA) carries out the Load test and Stress test, using software testing tools like Mercury Run to ensure acceptable response time during peak load. The SDA fine-tunes hardware devices and carries out optimization of software design to meet customer requirements. BD system should be very fast while reading, updating and retrieving information for display on the screen. Customers become impatient if the response is slow. Most applications require a response within a few milliseconds

•  Resilience. Big Data systems must be fault-tolerant to continue performing reliably and efficiently, even when some servers, in some clusters, go down. The system should be more human-fault tolerant. Its recovery mechanism should be so efficient that the end-user does not feel any disruption.

•    Consistency of Data. RDBMS based distributed databases have issues related to the consistency of data, the duplicity of data, concurrency of data and maintaining backup data at multiple locations.  Big Data systems must be robust enough to avoid such limitations.

•    Scalability. Scalability is the ability to maintain consistent performance whenever there is a sudden surge or drop in incoming data.

•    Wider Applicability. Big Data systems should support a wide range of applications in sectors like Financial, Insurance, Healthcare, Banking, e-Commerce, social media Analytics and Scientific appli­cations.

•   Minimal maintenance.  The Big Data systems should be able to carry out any scheduled maintenance without any slowdown in response time or any inconvenience to the customers.

Software Tools. Collecting huge amounts of unstructured data from various sources, in various formats and various languages helps the end user only when the system can quickly aggregate data, process data and display meaningful information to the user. This task is carried out by the software programmers who apply the Hadoop framework and necessary software tools to drill down and extract data relevant to the user for decision making.  Some important software tools used by software experts are briefly given below:

•    Hadoop.  Hadoop is an open-source written in Java and the most popular framework for working in a Big Data environment. It is a framework used for distributed storage of very large data and is capable of parallel data processing. Hadoop breaks large size data into smaller blocks to be processed separately on different data nodes (servers). Similarly, Hadoop automatically collects and outputs data across the multiple nodes and uses MapReduce to compile those into a single output.

•    MapReduce. It is a framework to process large unstructured data sets in a distributed manner by using a large number of nodes.

•  HDFSHadoop Distributed File System (HDFS) supports V4 multiple files to be simultaneously stored at multiple locations and processed concurrently.  The customers need not worry/ about the location of their files as those could by stored on any server, of any cluster and anywhere in the world. To the end-user, these data files appear to be in one location.

•    Pig.  Pig is a scripting language and essential component which sits on top of the Hadoop framework for processing large data sets. This is an open-source alternative to Hadoop and MapReduce

•     Hive. Like, Pig, Hive is the Hadoop framework component that sits on top of the Hadoop framework for processing large data sets. Hive uses an interpreter to transform SQL query to MapReduce Code. For this user need not write any code in Java or Python. Hive is an open-source alternative to Hadoop and MapReduce. However, there are certain jobs that can be executed more effectively using Hadoop MapReduce rather than using Pig or Hive scripts.

•     Apache Spark. It is a framework used for in-memory parallel data processing, which makes near real-time analytics possible.

•     HBase.  It is a NoSQL database that allows for the high-performance processing of information at a massive scale.

•    Mongo DB. It is an open-source database with the capability to handle both structured as well as unstructured data. Mongo DB is quite popular for storing, processing and analyzing Big Data.

•    Sqoop. This is an Apache software tool used for two way data exchange between RDBMS and Hive Database (HBase).

•    Impala. Impala is a very handy software tool that uses SQL queries to access data directly from HDFS.

 Big Data GPS Integration. Global Positing System (GPS) helps us locate vehicles, ships/ boats, and people through satellite tracking.  Integrating GPS with Big Data systems has a great help in moving vehicles, boats, ships or people. Let us say you are driving on the expressway from the city - A to the city -B and GPS can use real-time data to deter­mine your location and likely approach the location of Petrol Pumps, Restaurants or Resting place for a short halt.  Big Data also has information about your liking a particular type of food.  The integrated system knows the time of the day and sends a message on your smartphone or vehicle dashboard that you are just nearing your Favorite restaurant.  Indeed, you will feel excited about getting free and personalized info reaching you well in time.

Big Data Applications.  Big Data Technology is a boon for large enterprises operating globally, to grow fast and provide greater satisfaction to their channel partners, customers and distributors (Retail stores). Many BD applications are already in use and many more are on their way. Some common examples are briefly given in subsequent sections.

·    Customer Shopping. Amazon uses Big Data to have recorded our buying habits, financial capacity and likes. It even informs us of new brand arrivals. BD system knows, what you want to buy.  There are many Big Data service providers who provide shopping data to various Malls and business houses. This is a win-win strategy for earning big revenue both for Big Data Software companies and shopping malls.

·         Cinema / Theatre Ticketing.  One popular example is Netflix, which keeps a record of your choice for movies and the frequency of visits to various theatres/ cinema halls.  Netflix system can automatically recommend you the list of movies being shown during / coming weeks in various theatres/ cinema halls. This can trigger your interest to buy tickets.

·         Road Route planning. For global road navigation on a 24x7 basis, Google Map uses Big Data and special software tools which tell vehicle drivers the fastest route with lesser traffic congestion. With Google Map fully integrated with GPS, the system guides us all along the route, through voice about approaching Fuel stations, Restrooms, Restaurants, Picnic Spots and even where to turn. All this makes our travel stress free and enjoyable.

·         Analytic for Preventive Maintenance.  Based on a field survey of a vehicle's performance, , say Mercedes or Toyota can recall of particular brand/model of vehicles for preventive maintenance/ retrofitting of a critical component. This delights the vehicle owners.

·         Customer Analytics (CA).  Big Data is well suited for CRM functions particularly, for companies having a global business. CA is a software tool designed for CRM functions like tracking customers’ likes/dislikes, wish-list, buying habits, frequency of buying and financial capacity. For instance, a multinational company ABC in India has a global reach with over 5 million customers, 1500 retailers.  ABC has six manufacturing plants, two in India and one each in, Sri Lanka, Bangladesh, Nepal and Dubai. To manage the requirements of such a large number of customers and retailers for collecting, stocking and distributing goods, the Corporate Head Quarter (CHQ) at Delhi needs to analyze a huge data of from many formats and from multiple locations.  For good decision making, CHQ needs timely, accurate and well-formatted data. This is an ongoing process on a 24x7 basis. CA is equally beneficial for all the customers and retailers as well as CHQ. Let us take a typical case of a customer “X” who is visiting a retail store “S” on a weekend. As Mr./ Ms. “X” enters the store “S” and swipes his/her ID card, his/her shopping information is automatically picked up and transmitted across the network. This information could be his/her personal data, shopping habitsThe automated system can even pre-inform special customers about the arrival of a special brand and prompt them to buy it.

·         Industrial Analytics (Manufacturing). Preventive maintenance is one of the examples of how manufacturers can use Big Data. Breakdown of vehicles can seriously impact all the related processes, manufacturing, and transportation processes. To mitigate this, vehicle manufacturers should take timely proactive action. They should suitably place sensors near the vehicle engine to gather field data and carry out preventive maintenance. For this, the company should be collecting and analyzing sensors data for several months to form a history of defects. Based on this historical data, the Analytic can identify a set of patterns that are likely to result in a mechanical breakdown. For instance, the system recognizes that pattern formed by temperature sensors is similar to the pre-failure situation and alerts the maintenance team to check the machinery and fix it.  

·         Business Process Analytics (BPA).  Today some companies are already using Big Data Analytics to monitor the performance of their remote employees, truck drivers and field salesmen and improve their efficiency. For example, transport companies can collect and store telemetry data that comes from each truck /car in real-time. BPA can identify the typical behaviour of each driver. From this data, the company can plan safe driving conditions for their drivers by enforcing regular and timely halts for rest.

·         Analytic Frauds Detection. (AFD). Many lead banks are already using AFD to detect Credit Card Fraud in real-time and send alert messages to their registered customer (Actual Card Owner). Suppose you reside in India (Delhi) and someone is trying to do shopping using your credit card details in Dubai, the bank can check from your social network if you are visiting Dubai on that date. Thus, banks can protect their customers from fraud.

·         Healthcare Analytic (HCA). Big Data Analytics can help healthcare policymakers to have better decisions and contribute towards better public healthcare services with greater satisfaction. COVID-19 pandemic is the best example of many healthcare organizations using HCA for good decision-making.  In August 2020, Covid 19 pandemic had taken the whole world by surprise and there was no good treatment or vaccine available to ensure proper health care. Both the healthcare providers and lawmakers were faced with the difficult task of making good decisions for the patient care and their organizations. Researchers, service providers, and policymakers were depending on Big Data Analytics for healthcare to help improve their procedures and services for patient care, delivery of resources and preventive health measures. Researchers have also utilized Big Data analytics tools to forecast possible constraints on hospital capacity and resources. During this period of utter uncertainty, Big Data Analytic for healthcare had played a very significant role and effective vaccines were available by Dec 2020. Big Data Model helped decision-makers to decide on social distancing measures, school closure policies, testing capacity, corona contact-tracing strategies and mask-wearing by the population. Through this analytic model, policymakers could decide when and how to reopen businesses and schools, and how to distribute a vaccine to various states within their countries.

Data Analytics for Insurance. Insurance companies have always depended on Data Analysis to monitor the satisfaction level of their customers and earn better revenue for their services. Different types of insurance companies such as Travel insurance companies, Health, Life insurance companies and Agriculture Insurance companies rely on statistics to categorize their customers. Accident statistics, policyholders’ personal information, and third-party sources help to classify and group people into different risk categories, prevent fraud losses, and optimize expenses. 

The availability of digital platforms has provided new sources of information that can be used to understand the complex behavioural patterns of a customer and precisely determine his or her segment. For insurance purposes, big data refers to unstructured and/or Insurance companies forming their plan of action/business model on the basic idea of anticipating and diversifying risks. These companies work to guarantee insurance contracts for uncertain situations. Indeed, Big Data has revolutionized the insurance industry. Using BD Analytic allows insurance companies to target its customers more precisely and achieve higher customer satisfaction.  Major activities related to operations of Insurance Companies, where Big Data Analytic can impact are briefly given below:

·         Customer Acquisition.  Like every business, insurance companies need maximum customers to generate maximum revenue. This requires good outreach to woo customers and take/ retain a bigger share of the market for your product/ service. Therefore, the process of acquisition of customers should be made efficient and simpler. The Customer behaviour data collected from the web is unstructured data and forms part of Big Data. By using appropriate analytics, insurance companies can create targeted marketing campaigns that will acquire new customers. 

·         Customer Retention. A business is considered to be successful if its customer retention rate is higher. Insurance companies heavily rely on BD Analytics to retain their customers. Based on customer activity, algorithms can predict the early signs of customer dissatisfaction. Based on market intelligence provided, insurance companies can quickly react to improve their services and mitigate the grievances of any particular customer. Insurers can offer discounts or even change the pricing model for the client. 

·         Risk Assessment. Insurance companies always focus on the verification of customers’ information while assessing the risks. Big Data technology can efficient of risk assessment process.

·         Fraud Prevention and Detection. Using predictive modelling techniques, insurers can compare a person's data against past fraudulent profiles and identify cases that require more investigation. Thus, insurance companies can be saved against such frauds.

·         Cost Reductions. Cost-cutting is one of the major considerations by any industry. Big data technology can play a leading role to automate manual processes, making them more efficient and reducing the costs spent on handling claims and administration. This will allow the companies to offer lower premiums to their clients and be more competitive. 

·         Personalized Service and Pricing.  We all like to be treated specially. with personalized services. Life insurance companies, using big data can become more personalized by looking at the medical history of a customer. Big data technology allows insurers to work quickly on a customer’s profile, decide on a suitable risk class, form a pricing model, automate claims processing, and deliver the best services. A study by McKinsey shows that automation saves 43% of the time of insurance employees. 

·         Travel Insurance.  Innovations can simplify and speed up the interaction with customers. Automation of communication improves customer satisfaction, and facilities speedy offer which appears more beneficial to the customer. 

 Jobs Creation. It is true that Cloud Computing and Big Data environments are taking away routine jobs of data entry and basic coding but it is also offering new jobs, though needing a new skill set. This wave is unstoppable and we all must learn new skills and adopt cloud computing and Big Data as the future computing environment. It is estimated that India will have a 32% share of the Big Data global market by 2025. Despite initial apprehension about data ownership, security and privacy, Cloud Computing and Big Data are the new business growth engines Therefore, all young professionals must “Get Set and Go”, to ride this wave of Big Data sweeping across the globe Big Data is the key for business success of large business organizations. Big Data also offers many jobs to energetic professionals to become highly sought after and well-paid Big Data consultants.  You need good competency in new skills like Hadoop, Map-Reduce, Pig, Hive, Mongo DB, Ruby, Java, Python and R. Statistical analysis using R language can help you in dealing with Business Intelligence (BI). There are many jobs in the Manufacturing, Banking and Healthcare sectors, Transportation and Logistics, Ware House management and many more.  Likewise, there are many jobs in Government Departments – Finance, Metrological, Agriculture, Environment, Labour where Bid Data is partially or fully adopted.

Big Data Impact Workplaces.  Today there is data deluge since data is in the form of text. images, photographs, videos are being shared among millions of users across the globe and on a 24x7 basis. Data is being transacted in an unstructured manner, in various formats, languages and at a great speed. This data includes Twitter feed, posts and articles, comments, likes on LinkedIn, Facebook, Instagram, Telegram, sharing of blogs, posts/articles, and tech publications.  As per the latest survey, Big Data is flowing at a super speed, where there are approximately a 0.5Million comments posted, 0.3 million statuses updated, and 0.14 million photos uploaded to Facebook every minute, in addition to desktop computers, laptop computers, hand-held devices like tablets, smartphones, smarts devices, sensors and IoT can generate many varieties of data. Such data can be emails text messages video contents, voicemails, tweets, and many other forms of data. When handled properly, such vast data (Big Data) can help the user in timely and accurate decision-making. However, this will require efficient analytical platforms which can handle such large volumes of data.  Big Data will impact the work environment. The impact of Big Data on workplaces is briefly given below:

·         Customer Relationship. The existing Customer Relationship Management (CRM) software does a great job at providing many departments with an overview of their customers' business profiles. Data from CRM software can be used to personalize sales processes, target marketing campaigns, nurture customer relationships, and more.  However, CRM software works particularly well with structured data, such as demographic information like names, product history, addresses.  On the other hand, big data consists of mostly unstructured data, such as sentiment analysis from social media networks. While structured data fits well in a database and can be quickly extracted for analysis. However, unstructured data needs different handling since it is more diverse and comes at random at times.  The benefit of integrating unstructured big data in your CRM system means deeper insight into your customers, and uncovering their shopping frequency, wish-lists and financial capacity. This information can help in predictive modelling, better customer segmentation, and developing innovative experiences for the customer. This helps in promoting new products.

·          HR Hiring. Hard copy (Paper) CVs have been replaced by digital contact forms, which are convenient to fill in the details, no-cost and have more ways to showcase your skill-set and immediately reach the hiring agency. Earlier, HR staff had to do a    cumbersome task of manually sifting through all the received CVs, shortlisting candidates and then sending out initial interview emails/ letters to the potential candidates. That is why smarter hiring practices are being adopted by all leading hiring agencies that use Big Data Analytics in conjunction with  AI and ML. With available software, large volumes and varieties of data on potential candidates can be fed into a neural network. Neural systems can easily and quickly carry out an in-depth analysis of personal qualifications, experience, traits, and soft skills. Big data analytics can track post-hiring performance and highlight which factors lead to good selection vs bad selection. Thus, Big Data Analytics can take off a considerable load from HR staff and they can devote their valuable time to other important tasks. This will lead to increased productivity of HR staff, by deploying lesser resources for selecting the right candidate. Thus, Big Data will also help with the retention of good employees, improvement in production, quality and growth.

·         Decision-Making. New work mantra is “Work Smarter, not Harder.” In the age of data-driven businesses, working smarter through the use of data analytics gives you a competitive edge. Research firm IDC estimates that organizations that make it a priority to discover and analyze relevant data could generate an extra $430 billion in productivity by 2020. Big data analytics will only help businesses make smarter decisions faster – allowing them to find disruptive opportunities the moment they arise. Of course, this means harnessing real-time data streams, which requires a level of discerning good quality data from poor quality data.

·         Office Space. Big data and IoT with smart connected devices can lead to more efficient and innovative office spaces.

·         Facility optimization.  IoT can be used to save energy. Embedded sensors at key areas throughout the office can detect which parts of the building are commonly used at which times of the day. Electrical components can be automated to power on or power off when necessary. Sensors outside the building can obtain real-time weather analytics to adjust for changes in temperature – prompting the AC or the heat to kick in.

·         Building Maintenance This is another trending use of Big Data and IoT. When certain areas of the building aren’t performing at their optimum levels, sensors could trigger these responses, and AI/ML will notify building administrators of the root cause of each problem before any costly damage occurs.

Issues related to adopting Big Data.  Adopting Big Data in an organization has a number of issues depending upon its business size, product range, and business alliances. Some of the main issues faced by new entry are briefly given below:

·      Accuracy. Big Data may contain some errors and is not suitable where absolute accuracy is crucial.

·       Applicability. Big Data is not one-fits-all and may not be suitable for small to medium industries.

·     Migration to Big Data. `Existing organizing using a client-server environment with dedicated communication network have hesitation in-migration from existing to new (Big Data) environment. This is now made easy by using software tools like Apache‘s Sqoop which is designed to work with RDBMS like Oracle or MySQL. It works two-way data exchange from RDBMS, Enterprise data warehouse to HIVE and HBase, In addition, the service provider helps the customer for importing/exporting of data.

·         Standardization. Although Big Data has been evolving for nearly 10 years to meet industry requirements yet there are no Universal Standards. It may still take another 5-10 years to mature and have universally accepted norms and industry standards.

·         Security of Data. Each of the V4 criteria poses its own challenge when analyzing data. It is the responsibility of the service provider or data handling organization / IT department to take care of all technical aspects and ensure that received data is secure, accurate, consistent and clean.

·         Shortage of experienced professionals. There is an acute shortage of experienced professionals who can lead the Big Data Project team and meet various requirements in handling the Big Data environment.

·         Steep Learning Curve. One option is to train existing manpower in Big Data skills and overcome HR shortages. However, it takes time to learn and build good competence in handling Big Data tools.

Summary.  In the last ten years, social media has opened floodgates to the whole community across the world to interact on a 24x7 basis over 365 days from anywhere to anywhere. There are no geographical boundaries or time zones or any fixed format/type of data. Big Data is characterized by V4 (V-Velocity, V-Variety. V- Veracity and V-Volume), where data could be text, picture, image, video or audio. This data is unstructured and non-relational, requiring high tech workforce to pre-process and make it available to the decision-maker in the required format. Software like Hadoop, MapReduce, Hive, Pig, Ruby, Python, Mongo DB is the new software tools available to handle, Big Data environment.

Big Data is an invaluable resource for Business Decision Making, Research work, Health care, Agriculture. Big Data also helps in carrying out propaganda/election campaigns by political parties. One good example is an election campaign run by one country using social media data and transmits false messages to impact the minds of US people to vote in Favour of their friendly candidate.

Big Data although a very valuable information asset, without applying it to problem-solving, it does not serve many purposes. As the Digital World continues to expand, business processes and practices will focus less on acquiring big volumes of data, but focus more on digging down relevant data for better decision-making. Big Data needs the support of other technologies like Cloud Computing, IoT, AI, ML and software tools like Hadoop, MapReduce, Pig, Hive, Scala, Ruby and Python.

Big Data and Data Science are helping to evolve new business solutions. Big Data analytics is the most popular IT trend since 2012. It is gaining widespread application in many industries, like Banking, Insurance Healthcare, Transportation, Manufacturing and many government sectors like Departments of Finance, Healthcare, Metrology, Finance, Agriculture. The trend shows that in the next five years, the amount of unstructured data available to us will be huge. At the same time, analytics technology will become more advanced. It will be the solution to support our smart, fast and digital lifestyle.  In the near future, one may get a notification on his/her smartphone that he/she may soon encounter health issues. . It will also prescribe him/her suitable medicines. Such advances in technology are going to impact our lifestyle and human interaction.  The tidal wave of Big Data Technology is unstoppable. It is for the industry and organizations/institutes to adapt and learn to get the best out of it and start making better decisions.  An early start will be an advantage while any hesitation or delay will be a loss of business opportunity.

     

 

         

Comments

Popular posts from this blog

Internet of Things (IoT) for Economic Growth and Career Success

                                  “ Early adopting of Emerging Technologies is the engine for Growth and Carrier Succes s” The Internet of Things (IoT) is the extension of Internet connectivity with electro-mechanical devices like smartphone, digital camera, driverless cars, drones, robots and household appliances embedded with sensors and actuators.  It also includes vehicles, animals, birds that are provided with Unique Identifiers (UIDs) and have the ability to transfer data over the Internet, without requiring human-computer interaction . Various embedded software systems, wireless sensor networks, control systems, industrial automation, AI/IT-enabled homes and buildings and many other electronic gadgets/devices contribute to the IoT environment.  One simple example is an App Life 360 which allows two or more smartphones to globally track one another. This App helps for ...

Artificial Intelligence for Productivity, Efficiency and Career Success

Artificial Intelligence (AI) relates to how we observe, feel, learn, reason and act. This is transforming entire systems of production, management, healthcare, and governance.   Due to exponential growth in data processing power, AI is continuing to gain momentum in Medical Diagnostics (MD), Machine Learning (ML), Deep Learning (DL), document   retrieval and processing, Business Intelligence (BI), Industrial Automation, Research & Development (R&D). Particularly, in the last 10 years, AI has picked up a fast pace in innovation and application in many fields. It has a great significance in fast developing Digital World. AI is based on well-designed algorithms by a team of experts from multi-disciplinary areas and it is stored in the equipment/device as embedded   software . AI is no more a threat to jobs and instead, it is becoming the lifeline of every organization. It is therefore essential for the top management and all staff to be fully acquainted with AI ....

Ride Technology Wave for Rising in Career

“ You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.”   -  R. Buckminster Fuller The technology revolution sweeping across the globe at a great speed is also termed as a 4 th  Industrial Revolution or Industry 4.0 (Short form I4.0). This is indeed transforming manufacturing and production processes, empowering better and faster decision making and global reach on a 24x7 basis for marketing and sale. The countries like South Korea, Singapore, Germany, Japan and the USA have taken a great leap forward and introducing state-of-the-art technology. India though 6 th  largest manufacturing country, it lags behind in introducing Robotics and AI in its industries.  However, during 2017-2018 India has taken a big leap forward to maximize use of technology in manufacturing and ensure that manufacturing sector contributes 25% of Indian GDP by 2022.   Some ...