In the first part of this blog series, I discussed
how the hype around analytics powered by big data has glossed over the critical
enablers and hard work necessary to fulfill that promise. I also discussed how
a clear analytical capability divide currently separates the industry in terms
of the critical capabilities requisite for analytical success. I recognized the
need for an incremental roadmap to bridge the capability gap as opposed to the
prevalent ‘big bang’ approach, where on one hand, the diverse degree of
implementation complexity of the big data use cases is not acknowledged, and on
the other hand, massive investments are made in attempt to build the foundation
in data lakes without a discernable ROI in use cases.
In this second part
of the blog, I refer to the emerging data lake architecture patterns to propose
an incremental data induction pattern. Based on that progression, I will
propose a reference architecture that preserves current investment and supports
gradual advancement, implementing use cases involving more complex and varied
types of data, incrementally producing more and more impactful results.
Choice of use cases: Are big data and analytics necessarily combined?
Big data, Hadoop,
and analytics have become almost synonymous in today’s parlance, and success
with analytics is the refrain of today’s IT success. But, big data programs do
not need to necessarily start with advanced analytics. Big data and Hadoop have
many other operational use cases that are more technical in nature, and
therefore, less impacted by the daunting impediments discussed in the previous
blog. They provide an excellent opportunity for organizations to start on their
big data voyage with much less capability.
Data warehouse (DW) off-loading is
one such use case. Here, the Hadoop-based data lake is emerging as a natural
fit for the huge detail transactional data sets that are being relocated to the
data lake as organizations are modernizing their data warehouse in a
multiplatform data warehouse environment (DWE). The data lake, with its
linearly scalable architecture, reduces expensive storage and computation
resources of the DWs and enables discovery-oriented exploration and analytics
on these huge data sets in the Hadoop platform—capabilities that business and
data analysts are pining for today.
A ‘big bang’ data lake program
invites the risk of failing: populating it with the entire enterprise data
irrespective of valid use cases will have poor ROI and present extreme
governance and data management challenges.
TDWI research on data lake use cases
supports this hypothesis. While 49% of the respondents understandably mentioned
advanced analytics as their use case for their data lakes, another 49%
mentioned data exploration and discovery, followed by 39% as extension of their
data warehouse, 36% as staging for their data warehouse, and 36% as data
warehouse off-load and cost reduction.
TCS has implemented
many such strong technical use cases for organizations, establishing the new,
modernized, extended multiplatform DWE architecture with the least disruption
and most effect. In fact, if you are early in the big data adoption cycle,
implementing these use cases is the only realistic way to start building the
requisite capability. The decision-making culture, the business alignment, the
data management, and the in-house technical skills come with practice rather
than analysis. In that sense, these more technical and tactical use cases
become the stepping stone for most organizations to begin their analytics
journey. We will take up this thought in discussing the roadmap in the next
blog.
Rethinking the ‘big bang’ approach to big data: An alternative to
overflowing your data lake
Industry research
on the types of data populating data lakes accords with the above proposition.
According to a TDWI survey, the exclusive management of big
data and other non-traditional data is still a minority practice for data lakes
(15%), whereas managing mostly traditional data is the majority practice (45%).
According to another TDWI survey, 92% of 473 respondents
are managing structured data, 65% are storing legacy data, 55% demographic and
other third-party data, 38% application logs, and 35% are storing data in semi
structured data. More exotic data types, IoT and unstructured data, seem to be
lagging at 6% and 12% only.
TCS’ practice also
indicates that organizations are most successful when adopting a natural
progression starting from internal structured data and gradually ingesting
increased complexity in terms of volume, velocity, and variety (3Vs), and in
that order. This observation is supported by the industry research above. This
incremental approach allows for the organization to build up the requisite
advancement in capabilities in terms of technical skills, induction of domain
expertise, enhanced data management processes, and requisite organization
change associated with the more and more disruptive consumption of the analytic
insights distilled from the increasingly complex data in the 3Vs.
A ‘big bang’ data
lake program invites the risk of failing: populating it with the entire
enterprise data irrespective of valid use cases will have poor ROI and present
extreme governance and data management challenges. Building an enterprise data
lake demands data-driven management culture, technology investments, new
decision procedures, redesigned roles, and expertise that is costly and takes
time to develop. Bridging the capability chasm here too is an incremental
affair. I will take up this train of thought in the roadmap definition.
The big data architecture pattern: A multiplatform DWE for gradual
complexity and maximum ROI
The diversity of
data types and workload processing is driving today’s multiplatform DWE
architectures. It gives users options so they can choose a platform with the storage,
performance, and price characteristics that match a given data type or
workload.
A recent TDWI report revealed that 17% of
surveyed data warehouse programs already have Hadoop in production alongside a
relational data warehouse environment, where the relational data warehouse and
the Hadoop-based data lake coexist with tight integration and complement each
other. That’s because the strengths of one compensates for the weaknesses of
the other. They simply provide different sets of functions, thereby giving
organizations twice the options.
Also, in terms of
usage of data, these two platforms play complementary roles. For example,
financial reports that demand accuracy down to the penny and a lineage that’s
unassailable in an audit will remain in the data warehouses. That’s why the
relational enterprise data warehouses (EDWs) still remain strongly relevant
today. Here, the data elements, their relationships, and derivations that are
mostly very complex are understood completely beforehand. As opposed to that
pattern, early ingestion and the data prep practices that go with it are more
appropriate for discovery analytics, and they tend to be the top priority for a
data lake. The outputs of such analytics are, by nature, estimates and
generalizations (e.g., customer segments and entity clusters suggesting fraud)
as against the requisite accuracy financial reports, according to the report
“TDWI Checklist Report: Emerging Best Practices for Data Lakes.”
Naturally, the
simple DWE has now become the systems architecture norm which includes a
central EDW with a few additional data platforms, and it will continue to be
the norm for some years according to TDWI research. The architectural complexity of
a DWE will increase with progressive induction of more big data types. DWEs
will start simple with a handful of platforms, evolving into complex DWEs,
integrating a dozen or more, where newer platforms will be added to induct more
and more velocity and variety of data.
Recommended
Reference Architecture
The diagram above
is TCS’ recommendation of the modern hybrid, integrated, and multiplatform data
warehouse environment, and its data flows at a high level. This architectural
pattern off-loads exploration, exploitation, storage, and processing of
high-volume structured, semi structured, and unstructured data to its Hadoop
layer, and leaves the complex processing of ‘small’ data to the relational
layer, which it does best.
This architecture
avoids the expensive and time-consuming step of copying the entire enterprise
data to the data lake—that step is redundant with big data connectors being
available for all established relational databases. These connectors allow for
an analytical data flow or an ETL process to access both data stores
seamlessly. This architecture also keeps sensitive data within an
organization’s secure enterprise storage systems—security and governance on the
Hadoop layer would need to be applied on individual relational data sets being
copied over on a use-case basis, providing easier control.
This architecture
also protects the investment in the relational data warehouses and makes their
use to the fullest extent in the new environment. It reduces risk with the
least disruption in the existing implementations, and provides the best ROI by
reducing unnecessary investments in storing enterprise data sets that are best
left where they are at present.
Towards the roadmap
In terms of the
roadmap for big data and analytics, their use cases should have increasing
demands in the dimensions of implementation capability and capacity needed, as
well as the degree of organizational change required. The roadmap should start
with technical use cases that require the least additional skill and have a
positive, if minimal, impact to the business processes. The roadmap then
gradually evolves into use cases that demand more internal capability and
capacity and have wider business impact, in the final stage being adopted for
use in the organization’s strategic planning.
In the next blog, I will expand on a
four-phase analytics strategy and roadmap. I will outline a progressive
approach that expands in big data implementation capability complexity and the
degree of organizational change involved in their intrusiveness on current
business processes.
Thanks to Suman Ghosh @TCS for Enlightening us with this Article.
Customized Experience - the distinct advantage: Mobile-accommodating applications have productivity to bring information that clients investigate.data science course in pune
ReplyDeleteAttend The Data Science Courses in Bangalore From ExcelR. Practical Data Science Courses in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Science Courses in Bangalore.
ReplyDeleteExcelR Data Science Course Bangalore
Such a very useful article. I have learn some new information.thanks for sharing.
ReplyDeletedata scientist course in mumbai
great post , thank you for posting the content.
ReplyDeletemachine learning institute in bangalore
After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.Data science course in mumbai
ReplyDelete
ReplyDeleteExcelr is providing emerging & trending technology training, such as for data science, Machine learning, Artificial Intelligence, AWS, Tableau, Digital Marketing. Excelr is standing as a leader in providing quality training on top demanding technologies in 2019. Excelr`s versatile training is making a huge difference all across the globe. Enable ?business analytics? skills in you, and the trainers who were delivering training on these are industry stalwarts. Get certification on "data science training institutes in hyderabad"and get trained with Excelr.
I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
ReplyDeleteExcelR data science
This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.
ReplyDeleteExcelR Data Analytics courses
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. IEEE Projects for CSE in Big Data But it’s not the amount of data that’s important. Final Year Project Centers in Chennai It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
ReplyDeleteSpring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.
Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai
I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
ReplyDeletedata analytics courses online
Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
ReplyDeletedata science course in mumbai
Attend The Machine Learning courses in Bangalore From ExcelR. Practical Machine Learning courses in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Machine Learning courses in Bangalore.
ReplyDeleteExcelR Machine Learning courses in Bangalore
This is a wonderful article, Given so much info in it, Thanks for sharing. CodeGnan offers courses in new technologies and makes sure students understand the flow of work from each and every perspective in a Real-Time environmen python training in vijayawada. , data scince training in vijayawada . , java training in vijayawada. ,
ReplyDeleteReally nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
ReplyDeleteInvisalign orthodontists
Very informative post ! There is a lot of information here that can help any business get started with a successful social networking campaign !data science course
ReplyDelete360DigiTMG
The information provided on the site is informative. Looking forward more such blogs. Thanks for sharing .
ReplyDeleteArtificial Inteligence course in Patna
AI Course in Patna
You might comment on the order system of the blog. You should chat it's splendid. Your blog audit would swell up your visitors. I was very pleased to find this site.I wanted to thank you for this great read!!
ReplyDeleteKnow more Data Scientist Course in Pune
Your blog is splendid, I follow and read continuously the blogs that you share, they have some really important information. M glad to be in touch plz keep up the good work.
ReplyDeleteData Science Courses
Data Science Courses
ReplyDeleteIt's really nice and meaningful. it's really cool blog. Linking is very useful thing. You have really helped lots of people who visit blog and provide them useful information.
Good blog, its really very informative, do more blog under good concepts.
ReplyDeletegerman classes in bangalore
german language course in bangalore
german language classes in bangalore
best german classes in bangalore
German Language Course in Chennai
german language course in madurai
german classes in hyderabad
German Language Classes in Chennai
DevOps Training in Bangalore
DOT NET Training in Bangalore
Attend The Machine Learning Course Bangalore From ExcelR. Practical Machine Learning course Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Machine Learning course Bangalore.
ReplyDeleteMachine Learning Course Bangalore
Cool stuff you have and you keep overhaul every one of us
ReplyDeleteCorrelation vs Covariance
Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
ReplyDeleteSimple Linear Regression
Correlation vs Covariance
I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
ReplyDeleteMachine Learning Courses The web site is lovingly serviced and saved as much as date. So it should be, thanks for sharing this with us.
ReplyDeleteThanks for sharing amazing blog very useful!!
Data Science Course in Hyderabad
Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.data science course in Hyderabad
ReplyDeleteAttend The Business Analytics Courses From ExcelR. Practical Business Analytics Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analytics Courses.
ReplyDeleteBusiness Analytics Courses
Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteSimple Linear Regression
Correlation vs covariance
data science interview questions
KNN Algorithm
Logistic Regression explained
I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.
ReplyDeleteSimple Linear Regression
Correlation vs Covariance
I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
ReplyDeleteSimple Linear Regression
Correlation vs covariance
KNN Algorithm
Logistic Regression explained
Very nice blogs!!! i have to learning for lot of information for this sites…Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, data sciecne course in hyderabad
ReplyDeleteExcellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking Best data science courses in hyerabad
ReplyDeleteI wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
ReplyDeletea href="https://www.excelr.com/data-analytics-certification-training-course-in-pune/"> Data Analytics Course in Pune/">It is perfect time to make some plans for the future and it is time to be happy. I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.
ReplyDeleteSimple Linear Regression
Correlation vs Covariance
I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
ReplyDeleteData Science Courses Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! It’s always nice when you can not only be informed, but also entertained! Cool stuff you have and you keep overhaul every one of us
This Was An Amazing ! I Haven't Seen This Type of Blog Ever ! Thankyou For Sharing, data science online training
ReplyDeleteThis is a wonderful article, Given so much info in ExcelR Machine Learning Course In Pune it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.
ReplyDeletevery well explained. I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteLogistic Regression explained
Correlation vs Covariance
Simple Linear Regression
data science interview questions
KNN Algorithm
Wow, What a Excellent post. I rceally found this to much informatics. It is what i was searching for.I would like to suggest you that please keep sharing such type of info.Thankdata science courses
ReplyDeleteNo matter how big or small the application,, good design and usability can make a happier customer. data science course in india
ReplyDeleteThis comment has been removed by the author.
ReplyDeletehello sir,
ReplyDeletethanks for giving that type of information. I am really happy to visit your blog.Leading Solar company in Andhra Pradesh
i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
ReplyDeleteData Scientist Course
I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
ReplyDeleteData Analytics Courses in Bangalore
Informative blog
ReplyDeleteData Science Course in Patna
Thanks for posting the best information and the blog is very important.data science interview questions and answers
ReplyDeleteExtraordinary blog filled with an amazing content which no one has touched this subject before. Thanking the blogger for all the terrific efforts put in to develop such an awesome content. Expecting to deliver similar content further too and keep sharing as always.
ReplyDeleteData Science Training
Thanks for posting the best information and the blog is very good.artificial intelligence course in hyderabad
ReplyDeleteNicely done, Thank you for sharing such a useful article. I had a great time. This article was fantastic to read. continue to write about
ReplyDeleteData Engineering Solutions
Data Analytics Solutions
Business Intelligence Solutions
Artificial Intelligence Solutions
Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing.data analytics course in warangal
ReplyDeleteI recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end. I would like to read newer posts and to share my thoughts with you.
ReplyDeletedata science course fee in hyderabad
Nice post! This is a very nice blog that I will definitively come back to more times this year! Thanks for informative post.
ReplyDeletedata analytics course in hyderabad
This is very educational content and written well for a change. It's nice to see that some people still understand how to write a quality post!
ReplyDeletebusiness analytics training in hyderabad
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteoffice Crack Free Download is based on well-known principles of command creation, including file, directory, folder, and state. Therefore, .ELOoffice 11.02.006 Crack
ReplyDelete“Wishing a very Happy New Year 2022 and a cheerful Christmas. Wishing you the wonderful and blessed times with your family and friends.” “On the .https://wishesquotz.com/christmas-wishes-for-mom/
ReplyDelete