Showing posts with label Cloud. Show all posts
Showing posts with label Cloud. Show all posts

Friday, April 26, 2024

Snowflake Optimization on top of any Table

There are several optimization techniques you can implement on a table to reduce storage and compute costs:

1. Partitioning: Partitioning your table can significantly improve query performance and reduce storage costs. You can partition your table based on a specific column, such as date, so that data is stored in separate parts. This can be particularly useful if you often query data for specific date ranges.

2. Indexing: Proper indexing can greatly improve query performance. However, it's important to find the right balance as too many indexes can increase storage costs and slow down write operations. Indexes should be created on columns that are frequently used in WHERE, JOIN, and ORDER BY clauses.

3. Data Compression: Depending on the database system you're using, you might be able to use data compression techniques to reduce storage costs. This can include techniques like row-level or page-level compression, or even using columnar storage for large data warehousing workloads.

4. Data Archiving and Purging: If your table contains historical data that is no longer needed for day-to-day operations, consider archiving or purging this data. This can significantly reduce storage costs and improve query performance on the remaining data.

5. Normalization: If your table contains redundant data, consider normalizing it. This involves splitting the table into two or more tables and defining relationships between them. This can reduce storage costs and improve data integrity, but it can also increase the complexity of your queries.

6. Use Appropriate Data Types: Using the appropriate data type for each column can also help reduce storage costs. For example, using a smaller integer type (like INT instead of BIGINT) can save space if the larger range of values isn't needed.

7. Column Store Indexes: If you're using a database system that supports column store indexes, these can provide significant performance improvements for read-heavy workloads. Column store indexes store data column-wise instead of row-wise, which can be more efficient for querying large datasets


Monday, February 12, 2018

Part Three: The Analytics Strategy and Roadmap - A Use Case Driven Plan to Incrementally Build an Analytics Capability Powered by Big Data

In the previous blogs of this three-part series, I addressed a clear analytical divide that has grown in the industry, where relatively mature BI shops are at a definite advantage over most organizations that have yet to fully realize an analytics capability powered by big data. The first blog in this series identified the critical capabilities needed for analytical success with big data, and the many impediments, both technical and organizational, that are holding companies back.
Building on this idea in the second blog, I outlined why the attempt to take a ‘big bang’ approach to big data, by first putting all of the enterprise’s data into a data lake, is not likely to succeed because it returns little ROI in the short run and has major investment, governance, and skills requirements. Instead, I proposed establishing a multiplatform data warehouse environment (DWE) with an architecture pattern that's designed to accommodate immediate used cases with specific goals and measurable ROI, so the program can fund itself along the way.
In this approach, the requisite analytics capabilities will be gained through a managed transformation, an incremental build up in a phased approach, where the big data journey is mapped in clear, achievable but increasingly challenging milestones that induct the different nature and types of big data. The strategic roadmap on big data will be formulated based on these early successes, with more participation and sponsorship of the business when it starts to see value from this technology. That will help refine the tactical aspects of the strategy execution.
In this blog, I present a four-phased roadmap to get there, each phase building the pre-conditions to succeed with the next. The phases will of course overlap when work on a previous phase continues with other use cases. I will cite telecom use cases in the customer experience domain only to illustrate the comparative progression and analytical maturity in each phase.
We have presented a practical roadmap to big data and analytics adoption based on successful practices in industry. This plan presumes nothing and builds on successes at each phase generating the pre-conditions for the next.

The use-case driven approach starts with more technical IT driven challenges and matures eventually to departmental operational decisions and finally to strategic decision support. The maturity looks like this: Data Warehouse Off-Loading, 2) Operational BI, 3) Operational Analytics and 4) Strategic Analytics. The use cases will have to be evaluated in two major dimensions: the implementation capability and capacity needed, and the degree of organizational change required relating to their impact on current business processes.
In the early part of the transformation, the big data initiatives will be more technical in nature and localized at the department level. They will require the least additional skill and have positive, if minimal impact, to business processes. In the later stages, the evolving use cases will have wider business impact and will demand more capacity and technical and organizational capabilities in big data. The final stage involves analytics adoption for use in organizational strategic planning.
As the later stages involve use cases that are more operational and strategic in nature, which can impact processes across many departments, they will demand a more robust organizational change management program to manage the change across different participating groups and additional governance requirements. Large companies will have multiple big data teams, and as the organization builds more advanced big data capabilities, teams will need to come together for interdepartmental use cases.

Phase 1: Offload data and workloads from legacy systems and the enterprise data warehouse

Like most other IT systems, as data warehouses age, their design and enabling technologies can become un-scalable in terms of their economics and performance. Adopting multiplatform data warehouse environments would solve many data storage and performance issues, which is why it is one of the strongest trends in data warehousing today. In this phase, high volume detail transaction data storage and processing will be off-loaded to a Hadoop platform, reducing the storage and computing resource requirements of the relational data warehouse platform. From a business viewpoint, this is a non-disruptive task. It preserves existing investments in data warehousing, and (when done well) it extends the life of an expensive and useful system.
The off-loaded detail data, which is hardly exploitable in a traditional RDBMS will also become amenable to analytic exploitation because of the linearly scalable architecture of Hadoop, increasing the value of these detail data to the business: they will be able to get valuable insights from this detail data with the right questions. Organizations can also explore the possibility of monetizing these detail data. For example, location-based and movement-over-time data can be obtained from Call Data Records in the telecom industry. Inducting mainframe data and/or offloading the processing to Hadoop, active archiving of historical data also is other example of IT use cases for this phase.
This phase will require a relatively small investment in the big data cluster: between 6-10 nodes depending on the data volume to be off-loaded. In terms of investment and ROI, typically this phase will pay for itself in terms of reduction in infrastructure costs, improvement of performance of ETL processes and reports, and in the additional value in detail data.
The foundation of big data capabilities for the organization will be laid in this phase: IT will get a foothold in Hadoop skills on familiar existing structured data. Data governance policies will be applied to the data off-loaded to Hadoop, and in doing that, the finer aspects of practicing the data governance principles and policies will be sorted out, again, on familiar territory of the data. An Agile development methodology with DevOps should be inducted in this phase, delivering value as early as possible while streamlining the support functions to the big data program.

Phase 2: Operational BI (event processing)

While the first phase is based on batch processing, the next will be based on near-real-time and subsequently real-time processing—starting with processing structured data, progressing to semi-structured and unstructured data.
It can start with rule-based event processing use cases on structured data (like fraud detection for telecom), which can happen in near real time, and then move on to processing more voluminous structured data in a more real-time basis (like identifying potential Mobile Switching Center failures and re-routing more profitable customers to a different Mobile Switching Center in real-time to avoid service degradation).
Semi-structured and unstructured data can be inducted for real-time event processing after these successes. Some telecom use cases could include analyzing customer interactions captured by a call center application to identify the key problems customers are complaining about. Sentiment analysis on this data can provide the intensity of customer dissatisfaction around these problems. The text analytics can be further improved by transcribing the recorded calls and using transcripts for this analysis. Further, voice analytics can be applied on recorded calls to measure the customer’s mood associated with the complaints. These analyses will not only provide statistics on overall complaints, but will be able to identify dissatisfied high-value customers in real time.
In this phase, the algorithms are mainly rule-based and fairly deterministic in nature, and the use cases can be limited by actionability and deployment confined to a single department, typically the departments that are showing more traction with the big data initiative, thus improving the chance of building more accurate models and ensuring deployment and use in operations.
The organization will develop the Hadoop data integration skills for different types of data in this phase. They will now have gradually developed a fairly advanced data governance capability and should have established data management policies and processes around it for these more exotic data types. There will be more pervasive use of these data sets by analysts through self-service exposed in analytical sandboxes. The induction of these new data sets will be closely linked with business use cases, data management practice (in terms of data ownership and accountability), ensuring enough data quality, capturing business metadata, security and privacy aspects, etc. Ideally it should not have great impediments and should have the requisite backing from the quarter of the business that will benefit from the use case. The data management process should be formalized through these implementations, developing the requisite controls and artifacts.
These parts of the business will now have adopted the use of big data and would have started realizing benefits out of it. The organization will now be at the “Analytical Practitioners” level. The big data cluster will get much larger with induction of these new high-volume data sources, but ideally it will be funded by the departments deploying the use cases.

Phase 3: Operational analytics

In phase 2, the data lake has been hydrated with varied structured, semi-structured, and unstructured data, and insights have been obtained from them. Typically, these datasets will progressively provide the customer 360-degree view, aggregating data from all customer touch points.
In phase 3, these insights can be combined using advanced analytic techniques to obtain predictive operational intelligence. For example, customer churn models will be deployed based on various types of data obtained on customer interactions in the previous phase. Campaign management algorithms can be refined based on this addition information. Call center data volume in different categories can be forecasted based on historical patterns.
Until phase 2, the big data program was tactical and bottom-up. Now it needs to be met with a top-down strategy to be effective at this next level. The input data as well as actions out of the insights from the use cases will typically span across departments. Hence, the big data program will need to have strategic direction and sponsorship at this phase, ensuring leadership support for identifying which operational areas analytics can be used to improve customer experiences most effectively, and to ensure that the insights obtained drive and enhance the business processes involved.
This leadership is essential for gaining buy-in from managers in sales, service, and support functions applying such insights. Through such leadership, analytics professionals will be able to collaborate with business managers to refine the algorithms and gain feedback about what worked and what did not in applying the analytics in real-world sales, service, and support. Active participation from the business will also be needed in data governance in respect to usage of data and the related privacy issues, which will be more prevalent in this phase. But, success in the previous phases should ideally ensure this participation and sponsorship.
The role of data scientists and domain specialists will become critical in this phase, and the company will have to invest in these skills. The organization is now moving towards being insight driven. Here, the business owners are putting faith in the predications and forecasts from the predictive models, and the organization has the critical skill base and a robust data management capability. The people, the process, the data, and the technology is in place. They have become “Analytical Innovators”. Organizations will catch up with the advantage of competitors and probably break away from them based on the success in this phase.

Phase 4: Strategic analytics

In this phase, adoption of analytics pervades the organization, and the most critical business processes become insight driven. Now the CXOs consult the analytical insights in their decisions, and more strategic decisions also take the big data ‘outside in’ view into account. The enterprise planning becomes more agile by including external drivers derived from the big data, making it more responsive to changes in market conditions and customer behavior. For a telecom company, this would mean analytics driving their strategic planning on product mixes, new products, cell, tower planning, etc.
The gradual transformation of the decision-making culture culminates in use of data to make smarter business decisions to drive creativity and innovation, bringing it to the frontiers of the practice of analytics. At this point, analytics having a huge impact on the bottom line is an established correlation.

The next step in the big data journey

Success with advanced analytics has many daunting pre-requisites that put the relatively mature BI shops at clear advantage, yet an agile management culture tuned to the rapidly changing market conditions is going to be a pre-requisite to survival, if not success, in the next decade—adopting analytics is no longer a choice.
We have presented a practical roadmap to big data and analytics adoption based on successful practices in industry. This plan presumes nothing and builds on successes at each phase generating the pre-conditions for the next. It starts from IT use cases with no business impact, progressing to more and more impactful use cases as the requisite capability develops. This generic and high-level roadmap can be customized for an organization, depending on its business challenges and opportunities, its current analytical maturity, and its internal challenges towards big data adoption.

Thanks to Suman Ghosh from TCS for enlightening us on the concepts.

Wednesday, December 20, 2017

The Analytics Divide - Critical Capabilities for Analytics Powered by Big Data - Part 1

Competitive intensity has increased across industries in recent times. Companies are being driven to deliver a consistent stream of market successes via innovative business models and products or improved processes that continually enhance competitive advantage. Analytics powered by big data has been the propelling force behind this wave of innovation, and executives across industries are being challenged to replicate the ubiquitous success stories achieved with analytics.
However, the hype around analytics successes is tending to gloss over the critical enablers and hard work necessary to reach the end of that rainbow. Latent in that hype is an alternative reality where most companies are actually still struggling to figure out how to use analytics to take advantage of their data.
There is a deep analytical divide in the industry, which needs to be recognized. It can perhaps be explained only in the relative maturity of the prior BI programs of these analytical ‘haves’ where the critical enablers were more or less already in place. Most other organizations today have an analytics vision, but lack an analytics strategy backed up by a practical plan to get there. According to an MIT Sloane survey, only 30 percent of respondents overall declared having a formal long-term big data and analytics plan. As big data capabilities continue to become an enterprise enabler, those who have waited cannot remain in the harbor forever.
How to engineer the bridge over this divide is an extremely relevant topic today for discussion. In this three-blog series, I will analyze the enablers and impediments of big data adoption, identify the possibilities and priorities the industry has set regarding the big data and analytics domain, and look at the prevalent patterns and practices in the different journeys organizations are undertaking. I will share an incremental adoption roadmap based on these elements that will attempt to address the concerns that are holding back organizations, and I will suggest a reference architecture that supports that incremental build to support more advanced capabilities and progressive complexities of a big data capability.
"There is a deep analytical divide in the industry, which needs to be recognized. It can perhaps be explained only in the relative maturity of the prior BI programs of these analytical ‘haves’ where the critical enablers were more or less already in place."

Enablers and impediments to analytics success

The Harvard Business Review explains that, at this point in the evolution of big data, the challenges for most companies are not related to technology. While gaining technology capabilities poses a challenge to adopting big data in the enterprise, many other factors play a big role, including culture, strategy, skills, and internal investments. Here are some key drivers and impediments to success with big data:
Data-driven culture: The previously mentioned MIT Sloane survey explains that most companies are not prepared for the robust investment and cultural changes that are required to achieve sustained success with analytics, including expanding the skill set of managers who use data, broadening the types of decisions influenced by data, and cultivating decision-making that blends analytical insights with intuition.
Deployment challenges: Leveraging the potential of predictive models has quite a few practical challenges. An article from explains that an analytical model has to produce consistent and repeatable results across the entire spectrum of input conditions and be simple enough to be deployed across all the operations impacted by the model. It has to be robust and responsive to changes in the business environment while operating within the limitations and constraints faced by the business, abide by all regulations that apply within the scope, and be intuitively explainable to management as well as to the frontline agent who, in turn, has to explain the outcome to a customer or a partner.
Strategic analytics plan: Companies that are successful with analytics are also much more likely to have a strategic plan for analytics, and this plan is usually aligned with the organization’s overall corporate strategy. These companies use analytics more broadly across the organization, and they are able to measure the results of their analytical efforts. The previously mentioned MIT Sloane survey highlights that the companies that have pulled away from the pack, “the Analytical Innovators,” are five times more likely to have a formal strategy for analytics than the least mature group. These companies recognize that they need to put in place a robust analytics culture. Data analytics is used by their C-suite for providing strategic direction to the whole organization and used by middle management to improve day-to-day operation of the organization.
Data privacy concerns: One of the biggest data challenges is around privacy and what is shared versus what is not shared. Self-service data access and broad data exploration that are crucial for analytics are also inherently risky in terms of privacy violations and compliance infractions. To avoid these problems, data governance policies need to be updated or extended to encompass data from the organization’s data lake, and users should be trained in how the policies affect their work with data in the lake. But there are very few data management professionals available for hiring who have prior experience with data lakes and Hadoop to frame these policies and implement them.
Skill gap: Big data technologies require a skill set that is new to most IT departments, which need expert data engineers to integrate all the relevant internal and external sources of data. Data scientists in a big data team should be comfortable speaking the language of business and helping leaders reformulate their challenges in ways that big data can tackle. In a world that’s flooded with data, it has become harder to use this data: there’s too much of it to make sense of unless the analysis starts with an insight or hypothesis to test. Here, the role of domain specialists has become absolutely essential for asking the right questions. People with these skills are hard to find and in great demand.
Drying up investments: As the hype around big data has ebbed down, it increasingly requires the same expectations for results as other IT projects. Where companies previously have been willing to fill data lakes with big data projects, executives may now want to see tangible business results faster to justify the initial and ongoing organizational investment in these projects.
These success factors are largely preventing many organizations from embarking on their big data and analytics journey. Yet, an agile management culture tuned to rapidly changing market conditions is going to be a pre-requisite to survival, if not success, in the next decade. Closing that capability gap is becoming mandatory for those that have yet to embrace analytics.
The requisite capabilities can only be gained through a managed transformation, an incremental build up in a phased approach where the big data journey is mapped in clear, achievable but increasingly challenging milestones. Here, success in each phase brings in capabilities required for the next level of complexity in terms of implementation complexity and organization change management.
In the next part of this blog, I will talk about the incremental complexities and more advanced capabilities and skills needed in inducting the different nature and types of big data, and the evolving architecture patterns needed to support that progressive complexity.

Source :- Suman Ghosh (Center of Excellence at TCS)
URL: :-