Friday, December 22, 2017

The Big Data Architecture Roadmap - An Incremental Approach - Part 2

In the first part of this blog series, I discussed how the hype around analytics powered by big data has glossed over the critical enablers and hard work necessary to fulfill that promise. I also discussed how a clear analytical capability divide currently separates the industry in terms of the critical capabilities requisite for analytical success. I recognized the need for an incremental roadmap to bridge the capability gap as opposed to the prevalent ‘big bang’ approach, where on one hand, the diverse degree of implementation complexity of the big data use cases is not acknowledged, and on the other hand, massive investments are made in attempt to build the foundation in data lakes without a discernable ROI in use cases.
In this second part of the blog, I refer to the emerging data lake architecture patterns to propose an incremental data induction pattern. Based on that progression, I will propose a reference architecture that preserves current investment and supports gradual advancement, implementing use cases involving more complex and varied types of data, incrementally producing more and more impactful results.

Choice of use cases: Are big data and analytics necessarily combined?

Big data, Hadoop, and analytics have become almost synonymous in today’s parlance, and success with analytics is the refrain of today’s IT success. But, big data programs do not need to necessarily start with advanced analytics. Big data and Hadoop have many other operational use cases that are more technical in nature, and therefore, less impacted by the daunting impediments discussed in the previous blog. They provide an excellent opportunity for organizations to start on their big data voyage with much less capability.
Data warehouse (DW) off-loading is one such use case. Here, the Hadoop-based data lake is emerging as a natural fit for the huge detail transactional data sets that are being relocated to the data lake as organizations are modernizing their data warehouse in a multiplatform data warehouse environment (DWE). The data lake, with its linearly scalable architecture, reduces expensive storage and computation resources of the DWs and enables discovery-oriented exploration and analytics on these huge data sets in the Hadoop platform—capabilities that business and data analysts are pining for today.

A ‘big bang’ data lake program invites the risk of failing: populating it with the entire enterprise data irrespective of valid use cases will have poor ROI and present extreme governance and data management challenges.

TDWI research on data lake use cases supports this hypothesis. While 49% of the respondents understandably mentioned advanced analytics as their use case for their data lakes, another 49% mentioned data exploration and discovery, followed by 39% as extension of their data warehouse, 36% as staging for their data warehouse, and 36% as data warehouse off-load and cost reduction.
TCS has implemented many such strong technical use cases for organizations, establishing the new, modernized, extended multiplatform DWE architecture with the least disruption and most effect. In fact, if you are early in the big data adoption cycle, implementing these use cases is the only realistic way to start building the requisite capability. The decision-making culture, the business alignment, the data management, and the in-house technical skills come with practice rather than analysis. In that sense, these more technical and tactical use cases become the stepping stone for most organizations to begin their analytics journey. We will take up this thought in discussing the roadmap in the next blog.

Rethinking the ‘big bang’ approach to big data: An alternative to overflowing your data lake

Industry research on the types of data populating data lakes accords with the above proposition. According to a TDWI survey, the exclusive management of big data and other non-traditional data is still a minority practice for data lakes (15%), whereas managing mostly traditional data is the majority practice (45%).
According to another TDWI survey, 92% of 473 respondents are managing structured data, 65% are storing legacy data, 55% demographic and other third-party data, 38% application logs, and 35% are storing data in semi structured data. More exotic data types, IoT and unstructured data, seem to be lagging at 6% and 12% only.
TCS’ practice also indicates that organizations are most successful when adopting a natural progression starting from internal structured data and gradually ingesting increased complexity in terms of volume, velocity, and variety (3Vs), and in that order. This observation is supported by the industry research above. This incremental approach allows for the organization to build up the requisite advancement in capabilities in terms of technical skills, induction of domain expertise, enhanced data management processes, and requisite organization change associated with the more and more disruptive consumption of the analytic insights distilled from the increasingly complex data in the 3Vs.
A ‘big bang’ data lake program invites the risk of failing: populating it with the entire enterprise data irrespective of valid use cases will have poor ROI and present extreme governance and data management challenges. Building an enterprise data lake demands data-driven management culture, technology investments, new decision procedures, redesigned roles, and expertise that is costly and takes time to develop. Bridging the capability chasm here too is an incremental affair. I will take up this train of thought in the roadmap definition.

The big data architecture pattern: A multiplatform DWE for gradual complexity and maximum ROI

The diversity of data types and workload processing is driving today’s multiplatform DWE architectures. It gives users options so they can choose a platform with the storage, performance, and price characteristics that match a given data type or workload.
A recent TDWI report revealed that 17% of surveyed data warehouse programs already have Hadoop in production alongside a relational data warehouse environment, where the relational data warehouse and the Hadoop-based data lake coexist with tight integration and complement each other. That’s because the strengths of one compensates for the weaknesses of the other. They simply provide different sets of functions, thereby giving organizations twice the options.
Also, in terms of usage of data, these two platforms play complementary roles. For example, financial reports that demand accuracy down to the penny and a lineage that’s unassailable in an audit will remain in the data warehouses. That’s why the relational enterprise data warehouses (EDWs) still remain strongly relevant today. Here, the data elements, their relationships, and derivations that are mostly very complex are understood completely beforehand. As opposed to that pattern, early ingestion and the data prep practices that go with it are more appropriate for discovery analytics, and they tend to be the top priority for a data lake. The outputs of such analytics are, by nature, estimates and generalizations (e.g., customer segments and entity clusters suggesting fraud) as against the requisite accuracy financial reports, according to the report “TDWI Checklist Report: Emerging Best Practices for Data Lakes.”
Naturally, the simple DWE has now become the systems architecture norm which includes a central EDW with a few additional data platforms, and it will continue to be the norm for some years according to TDWI research. The architectural complexity of a DWE will increase with progressive induction of more big data types. DWEs will start simple with a handful of platforms, evolving into complex DWEs, integrating a dozen or more, where newer platforms will be added to induct more and more velocity and variety of data.
Recommended Reference Architecture
The diagram above is TCS’ recommendation of the modern hybrid, integrated, and multiplatform data warehouse environment, and its data flows at a high level. This architectural pattern off-loads exploration, exploitation, storage, and processing of high-volume structured, semi structured, and unstructured data to its Hadoop layer, and leaves the complex processing of ‘small’ data to the relational layer, which it does best.
This architecture avoids the expensive and time-consuming step of copying the entire enterprise data to the data lake—that step is redundant with big data connectors being available for all established relational databases. These connectors allow for an analytical data flow or an ETL process to access both data stores seamlessly. This architecture also keeps sensitive data within an organization’s secure enterprise storage systems—security and governance on the Hadoop layer would need to be applied on individual relational data sets being copied over on a use-case basis, providing easier control.
This architecture also protects the investment in the relational data warehouses and makes their use to the fullest extent in the new environment. It reduces risk with the least disruption in the existing implementations, and provides the best ROI by reducing unnecessary investments in storing enterprise data sets that are best left where they are at present.
Towards the roadmap

In terms of the roadmap for big data and analytics, their use cases should have increasing demands in the dimensions of implementation capability and capacity needed, as well as the degree of organizational change required. The roadmap should start with technical use cases that require the least additional skill and have a positive, if minimal, impact to the business processes. The roadmap then gradually evolves into use cases that demand more internal capability and capacity and have wider business impact, in the final stage being adopted for use in the organization’s strategic planning.
In the next blog, I will expand on a four-phase analytics strategy and roadmap. I will outline a progressive approach that expands in big data implementation capability complexity and the degree of organizational change involved in their intrusiveness on current business processes.

Thanks to Suman Ghosh @TCS for Enlightening us with this Article.



  1. Customized Experience - the distinct advantage: Mobile-accommodating applications have productivity to bring information that clients science course in pune

  2. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.Data science course in mumbai


  3. Excelr is providing emerging & trending technology training, such as for data science, Machine learning, Artificial Intelligence, AWS, Tableau, Digital Marketing. Excelr is standing as a leader in providing quality training on top demanding technologies in 2019. Excelr`s versatile training is making a huge difference all across the globe. Enable ?business analytics? skills in you, and the trainers who were delivering training on these are industry stalwarts. Get certification on "data science training institutes in hyderabad"and get trained with Excelr.

  4. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
    ExcelR data science

  5. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
    data analytics courses online

  6. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.

    Invisalign orthodontists

  7. Very informative post ! There is a lot of information here that can help any business get started with a successful social networking campaign !data science course

  8. The information provided on the site is informative. Looking forward more such blogs. Thanks for sharing .
    Artificial Inteligence course in Patna
    AI Course in Patna

  9. You might comment on the order system of the blog. You should chat it's splendid. Your blog audit would swell up your visitors. I was very pleased to find this site.I wanted to thank you for this great read!!
    Know more Data Scientist Course in Pune

  10. Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking Best data science courses in hyerabad

  11. I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
    a href=""> Data Analytics Course in Pune/">It is perfect time to make some plans for the future and it is time to be happy. I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!

  12. I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
    Data Science Courses Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! It’s always nice when you can not only be informed, but also entertained! Cool stuff you have and you keep overhaul every one of us

  13. This is a wonderful article, Given so much info in ExcelR Machine Learning Course In Pune it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.

  14. No matter how big or small the application,, good design and usability can make a happier customer. data science course in india

  15. This comment has been removed by the author.

  16. I recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end. I would like to read newer posts and to share my thoughts with you.
    data science course fee in hyderabad

  17. Nice post! This is a very nice blog that I will definitively come back to more times this year! Thanks for informative post.
    data analytics course in hyderabad

  18. This comment has been removed by the author.

  19. This comment has been removed by the author.

  20. office Crack Free Download is based on well-known principles of command creation, including file, directory, folder, and state. Therefore, .ELOoffice 11.02.006 Crack

  21. “Wishing a very Happy New Year 2022 and a cheerful Christmas. Wishing you the wonderful and blessed times with your family and friends.” “On the .

  22. Reading this article has been a life-changing moment for me as after getting good knowledge from this article about how I can acquire knowledge about a wide range of principles and concepts of data science, artificial intelligence, machine learning, and data analytics, I got enrolled into one of the top most and renowned institutions that provide different type of courses in this field and help you to get hands-on experience. After reading this article, I researched 360DigiTMG and learned about the different types of courses they offer; now, I am enrolled in it to learn more and expand my skillsets and knowledge base. I would truly be indebted to the writer, and I hope to see more information and details about relevant courses even in the future.
    Data Science Institute In Jaipur

  23. Risk management software provides a centralized platform for managing risks across the organization. It allows for the consolidation of risk-related data, documentation, and processes in one place, making it easier to track, analyze, and mitigate risks effectively.

  24. We want to express our sincere gratitude for sharing valuable information about BBA colleges in Hyderabad.
    Colleges for BBA In Hyderabad

  25. An eloquent examination of the gradual approach to big data architecture! Rather than the 'big bang' approach, it's wonderful to see an emphasis on actual use cases and developing complexity. This plan is in line with the realities of data management difficulties and provides a strategic route for long-term growth. I'm looking forward to hearing more about your analytics approach and roadmap.
    Data Analytics Courses in India

  26. Amazing blog with incredible stuff that has never been covered before. I want to thank the blogger for all the great work, did to create this fantastic content. I hope you will continue to produce such content.
    Data Analytics Courses in Agra

  27. We wish to thank you very much for providing this important information about.
    B.Com Computers Colleges In Hyderabad

  28. Everyone is stating the same thing over and over again, but thanks to your blog, I was able to find some interesting and helpful information. I also really enjoy the way you write, therefore I would want to recommend your blog to other dudes in my circle.
    SAP MM Training in Hyderabad

  29. "This blog post does an excellent job of breaking down the intricacies of big data architecture with its incremental approach.
    Digital Marketing Courses in Hamburg

  30. "The Big Data Architecture Roadmap - An Incremental Approach" outlines a strategic and step-by-step plan for organizations to build robust big data architectures, ensuring scalable data processing and analytics capabilities. For those interested in mastering data analytics, London offers a range of Data Analytics courses, providing the knowledge and skills needed to navigate the complexities of big data. Please also Digital Marketing Courses in London .

  31. Your examination of the subject is comprehensive and presented in a clear manner. Eagerly anticipating further contributions from you.
    data Analytics courses in leeds

  32. This is an exceptional article, packed with a lot of valuable information.
    daa Analytics courses in leeds

  33. "Your blog series on 'The Big Data Architecture Roadmap - An Incremental Approach' offers valuable insights into the strategic planning required for building a robust big data infrastructure
    Digital marketing courses in woking

  34. I thoroughly enjoyed reading blog post on Part 2 of the Big Data Architecture Roadmap series, thanks for sharing valuable insights.

    Digital Marketing Courses in Italy

  35. Thanks for sharing informative and incredible series of Big Data Architecture Roadmap .
    data analyst courses in limerick

  36. The Big Data Architecture Roadmap is very knowledgeable and insightful thanks for the valuable guidance.
    Adwords marketing

  37. Insightful series on big data architecture! This incremental approach ensures a smoother journey, preserving investments. Thanks for sharing!

    Investment Banking Industry

  38. The blog post on The Big Data Architecture Roadmap is very informative and insightful , thanks for sharing knowledgeable post.
    Investment banking training Programs