Technical Rockers

Friday, December 22, 2017

The Big Data Architecture Roadmap - An Incremental Approach - Part 2

In the first part of this blog series, I discussed how the hype around analytics powered by big data has glossed over the critical enablers and hard work necessary to fulfill that promise. I also discussed how a clear analytical capability divide currently separates the industry in terms of the critical capabilities requisite for analytical success. I recognized the need for an incremental roadmap to bridge the capability gap as opposed to the prevalent ‘big bang’ approach, where on one hand, the diverse degree of implementation complexity of the big data use cases is not acknowledged, and on the other hand, massive investments are made in attempt to build the foundation in data lakes without a discernable ROI in use cases.

In this second part of the blog, I refer to the emerging data lake architecture patterns to propose an incremental data induction pattern. Based on that progression, I will propose a reference architecture that preserves current investment and supports gradual advancement, implementing use cases involving more complex and varied types of data, incrementally producing more and more impactful results.

Choice of use cases: Are big data and analytics necessarily combined?

Big data, Hadoop, and analytics have become almost synonymous in today’s parlance, and success with analytics is the refrain of today’s IT success. But, big data programs do not need to necessarily start with advanced analytics. Big data and Hadoop have many other operational use cases that are more technical in nature, and therefore, less impacted by the daunting impediments discussed in the previous blog. They provide an excellent opportunity for organizations to start on their big data voyage with much less capability.

Data warehouse (DW) off-loading is one such use case. Here, the Hadoop-based data lake is emerging as a natural fit for the huge detail transactional data sets that are being relocated to the data lake as organizations are modernizing their data warehouse in a multiplatform data warehouse environment (DWE). The data lake, with its linearly scalable architecture, reduces expensive storage and computation resources of the DWs and enables discovery-oriented exploration and analytics on these huge data sets in the Hadoop platform—capabilities that business and data analysts are pining for today.

TDWI research on data lake use cases supports this hypothesis. While 49% of the respondents understandably mentioned advanced analytics as their use case for their data lakes, another 49% mentioned data exploration and discovery, followed by 39% as extension of their data warehouse, 36% as staging for their data warehouse, and 36% as data warehouse off-load and cost reduction.

TCS has implemented many such strong technical use cases for organizations, establishing the new, modernized, extended multiplatform DWE architecture with the least disruption and most effect. In fact, if you are early in the big data adoption cycle, implementing these use cases is the only realistic way to start building the requisite capability. The decision-making culture, the business alignment, the data management, and the in-house technical skills come with practice rather than analysis. In that sense, these more technical and tactical use cases become the stepping stone for most organizations to begin their analytics journey. We will take up this thought in discussing the roadmap in the next blog.

Rethinking the ‘big bang’ approach to big data: An alternative to overflowing your data lake

Industry research on the types of data populating data lakes accords with the above proposition. According to a TDWI survey, the exclusive management of big data and other non-traditional data is still a minority practice for data lakes (15%), whereas managing mostly traditional data is the majority practice (45%).

According to another TDWI survey, 92% of 473 respondents are managing structured data, 65% are storing legacy data, 55% demographic and other third-party data, 38% application logs, and 35% are storing data in semi structured data. More exotic data types, IoT and unstructured data, seem to be lagging at 6% and 12% only.

TCS’ practice also indicates that organizations are most successful when adopting a natural progression starting from internal structured data and gradually ingesting increased complexity in terms of volume, velocity, and variety (3Vs), and in that order. This observation is supported by the industry research above. This incremental approach allows for the organization to build up the requisite advancement in capabilities in terms of technical skills, induction of domain expertise, enhanced data management processes, and requisite organization change associated with the more and more disruptive consumption of the analytic insights distilled from the increasingly complex data in the 3Vs.

A ‘big bang’ data lake program invites the risk of failing: populating it with the entire enterprise data irrespective of valid use cases will have poor ROI and present extreme governance and data management challenges. Building an enterprise data lake demands data-driven management culture, technology investments, new decision procedures, redesigned roles, and expertise that is costly and takes time to develop. Bridging the capability chasm here too is an incremental affair. I will take up this train of thought in the roadmap definition.

The big data architecture pattern: A multiplatform DWE for gradual complexity and maximum ROI

The diversity of data types and workload processing is driving today’s multiplatform DWE architectures. It gives users options so they can choose a platform with the storage, performance, and price characteristics that match a given data type or workload.

A recent TDWI report revealed that 17% of surveyed data warehouse programs already have Hadoop in production alongside a relational data warehouse environment, where the relational data warehouse and the Hadoop-based data lake coexist with tight integration and complement each other. That’s because the strengths of one compensates for the weaknesses of the other. They simply provide different sets of functions, thereby giving organizations twice the options.

Also, in terms of usage of data, these two platforms play complementary roles. For example, financial reports that demand accuracy down to the penny and a lineage that’s unassailable in an audit will remain in the data warehouses. That’s why the relational enterprise data warehouses (EDWs) still remain strongly relevant today. Here, the data elements, their relationships, and derivations that are mostly very complex are understood completely beforehand. As opposed to that pattern, early ingestion and the data prep practices that go with it are more appropriate for discovery analytics, and they tend to be the top priority for a data lake. The outputs of such analytics are, by nature, estimates and generalizations (e.g., customer segments and entity clusters suggesting fraud) as against the requisite accuracy financial reports, according to the report “TDWI Checklist Report: Emerging Best Practices for Data Lakes.”

Naturally, the simple DWE has now become the systems architecture norm which includes a central EDW with a few additional data platforms, and it will continue to be the norm for some years according to TDWI research. The architectural complexity of a DWE will increase with progressive induction of more big data types. DWEs will start simple with a handful of platforms, evolving into complex DWEs, integrating a dozen or more, where newer platforms will be added to induct more and more velocity and variety of data.

Recommended Reference Architecture

The diagram above is TCS’ recommendation of the modern hybrid, integrated, and multiplatform data warehouse environment, and its data flows at a high level. This architectural pattern off-loads exploration, exploitation, storage, and processing of high-volume structured, semi structured, and unstructured data to its Hadoop layer, and leaves the complex processing of ‘small’ data to the relational layer, which it does best.

This architecture avoids the expensive and time-consuming step of copying the entire enterprise data to the data lake—that step is redundant with big data connectors being available for all established relational databases. These connectors allow for an analytical data flow or an ETL process to access both data stores seamlessly. This architecture also keeps sensitive data within an organization’s secure enterprise storage systems—security and governance on the Hadoop layer would need to be applied on individual relational data sets being copied over on a use-case basis, providing easier control.

This architecture also protects the investment in the relational data warehouses and makes their use to the fullest extent in the new environment. It reduces risk with the least disruption in the existing implementations, and provides the best ROI by reducing unnecessary investments in storing enterprise data sets that are best left where they are at present.

Towards the roadmap

In terms of the roadmap for big data and analytics, their use cases should have increasing demands in the dimensions of implementation capability and capacity needed, as well as the degree of organizational change required. The roadmap should start with technical use cases that require the least additional skill and have a positive, if minimal, impact to the business processes. The roadmap then gradually evolves into use cases that demand more internal capability and capacity and have wider business impact, in the final stage being adopted for use in the organization’s strategic planning.

In the next blog, I will expand on a four-phase analytics strategy and roadmap. I will outline a progressive approach that expands in big data implementation capability complexity and the degree of organizational change involved in their intrusiveness on current business processes.

Thanks to Suman Ghosh @TCS for Enlightening us with this Article.

Source :- http://info.tcs.com/enterprise-solutions-oracle-big-data-architecture-blog.html

K@run@

Wednesday, December 20, 2017

The Analytics Divide - Critical Capabilities for Analytics Powered by Big Data - Part 1

Competitive intensity has increased across industries in recent times. Companies are being driven to deliver a consistent stream of market successes via innovative business models and products or improved processes that continually enhance competitive advantage. Analytics powered by big data has been the propelling force behind this wave of innovation, and executives across industries are being challenged to replicate the ubiquitous success stories achieved with analytics.

However, the hype around analytics successes is tending to gloss over the critical enablers and hard work necessary to reach the end of that rainbow. Latent in that hype is an alternative reality where most companies are actually still struggling to figure out how to use analytics to take advantage of their data.

There is a deep analytical divide in the industry, which needs to be recognized. It can perhaps be explained only in the relative maturity of the prior BI programs of these analytical ‘haves’ where the critical enablers were more or less already in place. Most other organizations today have an analytics vision, but lack an analytics strategy backed up by a practical plan to get there. According to an MIT Sloane survey, only 30 percent of respondents overall declared having a formal long-term big data and analytics plan. As big data capabilities continue to become an enterprise enabler, those who have waited cannot remain in the harbor forever.

How to engineer the bridge over this divide is an extremely relevant topic today for discussion. In this three-blog series, I will analyze the enablers and impediments of big data adoption, identify the possibilities and priorities the industry has set regarding the big data and analytics domain, and look at the prevalent patterns and practices in the different journeys organizations are undertaking. I will share an incremental adoption roadmap based on these elements that will attempt to address the concerns that are holding back organizations, and I will suggest a reference architecture that supports that incremental build to support more advanced capabilities and progressive complexities of a big data capability.

"There is a deep analytical divide in the industry, which needs to be recognized. It can perhaps be explained only in the relative maturity of the prior BI programs of these analytical ‘haves’ where the critical enablers were more or less already in place."

Enablers and impediments to analytics success

The Harvard Business Review explains that, at this point in the evolution of big data, the challenges for most companies are not related to technology. While gaining technology capabilities poses a challenge to adopting big data in the enterprise, many other factors play a big role, including culture, strategy, skills, and internal investments. Here are some key drivers and impediments to success with big data:

Data-driven culture: The previously mentioned MIT Sloane survey explains that most companies are not prepared for the robust investment and cultural changes that are required to achieve sustained success with analytics, including expanding the skill set of managers who use data, broadening the types of decisions influenced by data, and cultivating decision-making that blends analytical insights with intuition.

Deployment challenges: Leveraging the potential of predictive models has quite a few practical challenges. An article from BI-Bestpractices.com explains that an analytical model has to produce consistent and repeatable results across the entire spectrum of input conditions and be simple enough to be deployed across all the operations impacted by the model. It has to be robust and responsive to changes in the business environment while operating within the limitations and constraints faced by the business, abide by all regulations that apply within the scope, and be intuitively explainable to management as well as to the frontline agent who, in turn, has to explain the outcome to a customer or a partner.

Strategic analytics plan: Companies that are successful with analytics are also much more likely to have a strategic plan for analytics, and this plan is usually aligned with the organization’s overall corporate strategy. These companies use analytics more broadly across the organization, and they are able to measure the results of their analytical efforts. The previously mentioned MIT Sloane survey highlights that the companies that have pulled away from the pack, “the Analytical Innovators,” are five times more likely to have a formal strategy for analytics than the least mature group. These companies recognize that they need to put in place a robust analytics culture. Data analytics is used by their C-suite for providing strategic direction to the whole organization and used by middle management to improve day-to-day operation of the organization.

Data privacy concerns: One of the biggest data challenges is around privacy and what is shared versus what is not shared. Self-service data access and broad data exploration that are crucial for analytics are also inherently risky in terms of privacy violations and compliance infractions. To avoid these problems, data governance policies need to be updated or extended to encompass data from the organization’s data lake, and users should be trained in how the policies affect their work with data in the lake. But there are very few data management professionals available for hiring who have prior experience with data lakes and Hadoop to frame these policies and implement them.

Skill gap: Big data technologies require a skill set that is new to most IT departments, which need expert data engineers to integrate all the relevant internal and external sources of data. Data scientists in a big data team should be comfortable speaking the language of business and helping leaders reformulate their challenges in ways that big data can tackle. In a world that’s flooded with data, it has become harder to use this data: there’s too much of it to make sense of unless the analysis starts with an insight or hypothesis to test. Here, the role of domain specialists has become absolutely essential for asking the right questions. People with these skills are hard to find and in great demand.

Drying up investments: As the hype around big data has ebbed down, it increasingly requires the same expectations for results as other IT projects. Where companies previously have been willing to fill data lakes with big data projects, executives may now want to see tangible business results faster to justify the initial and ongoing organizational investment in these projects.

These success factors are largely preventing many organizations from embarking on their big data and analytics journey. Yet, an agile management culture tuned to rapidly changing market conditions is going to be a pre-requisite to survival, if not success, in the next decade. Closing that capability gap is becoming mandatory for those that have yet to embrace analytics.

The requisite capabilities can only be gained through a managed transformation, an incremental build up in a phased approach where the big data journey is mapped in clear, achievable but increasingly challenging milestones. Here, success in each phase brings in capabilities required for the next level of complexity in terms of implementation complexity and organization change management.

In the next part of this blog, I will talk about the incremental complexities and more advanced capabilities and skills needed in inducting the different nature and types of big data, and the evolving architecture patterns needed to support that progressive complexity.

Source :- Suman Ghosh (Center of Excellence at TCS)

URL: :- http://info.tcs.com/enterprise-solutions-oracle-the-analytics-divide-blog.html?utm_source=TWTOrg&utm_medium=BLGBD&utm_campaign=ORACLE

K@run@

Tuesday, June 6, 2017

ISRO - GSLV Mark III launches GSAT-19 satellite successfully

The heaviest of all till date by Indian Space Research Organisation carrying a 3.5 Ton Satellite.
This Satellite will enhance speed of future Internet in India.
And the GSLV Mark III has the capability to carry humans to Space for research.

K@run@

Technical Rockers

Pages

Friday, December 22, 2017

The Big Data Architecture Roadmap - An Incremental Approach - Part 2

Wednesday, December 20, 2017

The Analytics Divide - Critical Capabilities for Analytics Powered by Big Data - Part 1

Enablers and impediments to analytics success

Tuesday, June 6, 2017

ISRO - GSLV Mark III launches GSAT-19 satellite successfully

TDS Level Chart for Drinking Water

Total Pageviews

Crawler2