LYNXEYED: 3월 2011

2011년 3월 29일 화요일

Analysis Services Distinct Count Optimization Using Solid State Devices

A Real Practices Case Study at Xbox LIVE
Authors: Joseph Szymanski, Tyson Solberg, Denny Lee
Technical Reviewers: Lindsey Allen, Akshai Mirchandani, Heidi Steen

Executive Summary

To expand on the distinct count optimization techniques provided in the Analysis Services Distinct Count Optimization white paper, this technical note shows how using solid state devices (SSDs) can improve distinct count measures. We recount the experiences of the Microsoft Entertainment and Devices Data Warehousing Team (known for Xbox, Xbox LIVE, XNA, and Zune) in our analysis of applying SSDs to a real-world, distinct count heavy, Microsoft SQL Server Analysis Services customer environment. The key conclusion is that enterprise SSD devices, when combined with a well optimized Analysis Services MOLAP cube, will drastically improve the performance and scalability of the cube when it accesses distinct count measures. It can also improve non-distinct count measures, if the calculations being performed rely heavily on storage-engine calculations.

Purpose

Analysis Services distinct count measures are extremely expensive in all aspects of an Analysis Services solution – the time requirements for processing the data, the long query durations, and large storage space requirements. Often the best approach is to convince the analysts using your cubes to use alternate measures or calculations. However, in many cases the distinct count-based Key Performance Indicators (KPIs) are key components of business analytics systems. In such cases, the focus has to move from "Are you sure you need distinct count?" to "How can we make distinct count queries fast(er)?"
One way to improve the performance of distinct count measures is to change the business problem that they attempt to measure (for example, limiting the time range for distinct counts to specific years or months rather than all years). However, when you have exhausted your ability to reduce the solution’s complexity by simplifying business processes, you can turn to other techniques to improve your solution. This paper endeavors to share several that we came across in real-world enterprise usage scenarios.
As originally described in Analysis Services Distinct Count Optimization, distinct count calculations are Storage-Engine-heavy, resulting in heavy disk I/O utilization (that is, higher latencies and slower throughput). This leads us to the first and foremost among those techniques: the use of solid state devices (SSDs), which have taken the world by storm. Given the high initial cost, relative newness, and lower capacity per device, it is often difficult to find a good fit for them in the enterprise. However, Analysis Services dovetails well with the SSD story because SSDs offer ultra-high read speeds for both random and sequential I/O. Analysis Services queries generally access a large amount of relatively static data and SSDs provide the speed and flexibility required. For example:

Analysis Services scenarios are well suited for SSDs because most are designed for fast read performance.
The biggest benefit that SSDs offer over physical disks is that they provide random I/O read speed that is nearly as high as sequential I/O read speed – orders of magnitude faster than spin disks.
Data stored in MOLAP cubes is an ideal target for SSDs: most SSDs are rather low in capacity, but Analysis Services MOLAP cubes are generally small in size when compared to their data warehouse source. For example, in our sample case we have a 10-terabyte warehouse, which would cost an extraordinary amount of money to move to SSDs, but a 160 GB cube, which would be very easy and inexpensive to move to SSDs.
While the initial costs of SSDs are higher than those of spin disks, the overall lifetime costs of SSDs are comparable to spin disks because of cooling costs, differences in power consumption, and general maintenance costs associated with your storage (SSDs typically have lower maintenance costs).

Scenario

This paper covers the business scenario in which business users seek to improve the performance of their Analysis Services distinct count operations. As an example, the Microsoft SQLCAT team worked with the people at the Microsoft Entertainment and Devices Data Warehousing who needed to build their critical KPIs based on distinct counts. We decided that a significant research effort to find a way to make their distinct count queries run faster through design methodologies and the use of SSDs was worthwhile.
The engineering team, examining the scenario at hand, endeavored to find out the following:
a) Can distinct counts be made to run faster: a. Given a faster I/O system? b. Given better cube design?
b) Will such improvements be cost effective?
c) Will such improvements improve or hinder scalability, compared with the current design?

Datasets

The database is a real-world production dataset with the following characteristics.

Dataset	Sizes
MOLAP Cube	120 GB
SQL DW	10.0 terabytes

The cube design, for the distinct counts, is relatively simplistic, as noted in Figure 1.

Figure 1: Usage Cube Schema

OLAP Query Characteristics

One of the primary business drivers that needed an answer is: "How many distinct users used the service for <a time period>?" To get this answer, we focused on a single simple MDX statement and sought to make it run as fast as possible from cold cache (that is, we cleared all existing caches to force the data to load from the pertinent device).

SELECT [Measures].[Online Unique Users] ON 0,

[Date].[Date].[Date] ON 1

FROM [Usage]

WHERE

[Date].[Calendar Month].[Calendar Month].&[2009]&[1]

The measure "Online Unique Users" is a distinct count measure that, for the time period of the month selected, scans through 300 million SQL records in the MOLAP cube dataset.

Test Hardware

We utilized a number of servers to perform our query tests.
User Acceptance Testing Server
HP DL580 G5 4-socket, quad core, 2.4 GHz, 64 GB RAM server connected to a high end SAN array; Chart Label: "UAT SAN"

Development Server
Dell R710 2-socket, quad core, 2.93 GHz, 72 GB RAM
Disk Arrays

1 Fusion-io ioDrive PCI-E SSD device; Chart Label: "Dev SSD"
Two Dell MD1000 enclosures (16 x 750 GB 7200RPM drives); Chart Label: "Dev Hard Drives"

SQL Customer Advisory Team Server
Dell R905 4-socket, quad core, AMD server with 4 Fusion-io ioDrive PCI-E SSD device; Chart Label: "SQLCAT SSD"

We’d like to thank Fusion-io, Dell, and Hewlett-Packard for the use of and support with their hardware.

Analysis

As originally described in Analysis Services Distinct Count Optimization, distinct count calculations are Storage-Engine-heavy, resulting in heavy disk I/O utilization. Therefore, the original hypothesis for this case study was modeled after the reasonable argument: "If we are disk I/O bound, and Analysis Services provides a 100 percent random read load, SSDs should drastically improve performance."

Initial Query Performance Comparison Between SSDs and Spin Disks

Let’s start by comparing the query performance between SSDs and spin disks (local hard drives and SAN).

Figure 2: Initial Performance Optimization (Lower Is Better)
As can be seen in Figure 2, the use of Fusion-io SSDs resulted in dramatically faster query performance when compared to local drives or SAN drives. For example, for the seven-month query, SSDs were 5.5 times faster than the SAN hardware.
But as with any successful BI performance improvement, Analysis Services users also asked more resource-intensive questions such as: "For the last year, how many distinct people did <X>?" That is, these questions resulted in distinct count queries that covered a year’s worth of data across a substantially larger dataset. In an attempt to find further performance improvements, the engineering team profiled the cube during query time, in production, and found a very odd thing: Disk usage and CPU usage were nowhere near the limits of what the system was able to handle.

This finding was counterintuitive, because all tests were run on a system that was dedicated to this exercise (that is, the system had no other users or processes running concurrently); it was not a matter of resource contention. We then worked with the SQLCAT team and other experts to find out why Analysis Services was not using more system resources to query the cube, starting with an investigation of how the partitions were defined.

The Move to an Optimally Parallel Cube

As noted in Analysis Services Distinct Count Optimization, partitioning significantly improves distinct count query performance. By creating distinct buckets based on distinct value and time, you can significantly improve distinct count query performance by forcing the Analysis Services Storage Engine to fire off many more threads – one for each partition – and therefore more quickly calculate the distinct value. But if partitions are designed with an uneven number of distinct values (such as in Figure 3), the query may ultimately become single-threaded (even though all four partitions are being queried) because the Analysis Services Storage Engine is waiting for the largest partition (data file with values from 1,500 to 20,000) to complete its calculations. This behavior explained the puzzling results around disk and CPU consumption in the earlier tests.

Figure 3: Unevenly Distributed Partitions
A more evenly distributed set of distinct values within the partitions results in all four threads completing at approximately the same time, resulting in minimal spinning and wait time while the calculations are completed. Clearly, parallel queries across the partitions had to be part of our solution.

Figure 4: Evenly Distributed Partitions
Figure 4 shows the even distribution of data among the partitions, which is a key concept for distinct count query optimization. The Analysis Services Storage Engine will initially query the header file to determine which data partitions to query for the range of distinct count values. This way, the storage engine queries only partitions that have the values required to complete the calculation.
After extensive testing, we rediscovered some important rules and added them to the distinct count partitioning strategy to ensure all distinct count queries are optimally parallel:
The distinct count measure must be directly contained in the query.
If you partition your cube by the hash of a UserID distinct value, it is important that your query perform a distinct count of the hash of the UserID – not the distinct count of the UserID itself. For fast distinct count query performance, it is important for the distinct count value itself to be placed in its own periodic partition (for example, User 1 repeatedly shows up in only month 1 partition 1, User 100 in month 1 partition 2, and so on) and for the values to be non-overlapping (for example, Users 1-99 in month 1 partition 1, Users 100-199 in month 1, partition 2, and so on). The hashing will cause the records in the same range to be distributed across multiple partitions, therefore losing the non-overlapping behavior. Even if the UserID and the hash of the UserID have the same distribution of data, and even if you partition data by the latter, the header files contain only the range of values associated with the hash of the UserID. This ultimately means that the Analysis Services Storage Engine must query all of the partitions to perform the distinct on the UserID. For more information about these concepts, see the white paper Analysis Services Distinct Count Optimization.

The distinct count values need to be continuous.
As implied in Figures 3 and 4, each partition has a continuous range of values so that the partition contains the values from 100 – 20,000 (in this example). Based on the empirical evidence we gathered in our testing for this case, it appears that distinct count query performance improves if the values within the partitions are continuous.
After we followed these two rules, we were easily able to improve query parallelism with very few changes.
More specifically, we analyzed our data size, selected a month as the coarse time grain for the distinct count measure group partitions, and then sub-selected the data, per month, into <n> partitions, where n is the number of physical CPU cores on the OLAP hardware. We made this decision after we identified a number of options, tested them, and found this particular grain to be the best for our set of data. Other than the partition changes, the cube design stayed the same, and we did not alter any aggregations for this cube. Note, we had followed the established guidelines of the SQLCAT white paper Analysis Services Distinct Count Optimization.
Note: To allow for more repeatable distinct count query comparison, the cube used here contained no aggregations on distinct count measures.
The following lists various combinations of measure group slicer queries and distinct count measures.

SQL Query WHERE clause	Analysis Services distinct count member	Is the query optimally parallel? If not, why?
WHERE userid % 16 = 0	userid	NO: Query does not return a continuous dataset.
WHERE CAST(HASHBYTES('SHA1',CAST(userid AS VARCHAR)) AS BIGINT) BETWEEN a AND b	userid	NO: The Analysis Services member "userid" is not contained directly in the query.
WHERE userid BETWEEN a AND b	userid	YES
WHERE userid % 16 = 0	CAST(HASHBYTES('SHA1',CAST (userid AS varchar)) AS bigint)	NO: Query does not return a continuous dataset.
WHERE CAST(HASHBYTES('SHA1',CAST(userid AS VARCHAR)) AS BIGINT) BETWEEN a AND b	CAST(HASHBYTES('SHA1',CAST (userid AS varchar)) AS bigint)	YES
WHERE userid BETWEEN a AND b	CAST(HASHBYTES('SHA1',CAST (userid AS varchar)) AS bigint)	NO: The Analysis Services member <Hash of Userid> is not directly in the query.

Now That We Have an Optimally Parallel Cube…

The results were stunning, as shown by the "V2" lines for both SSDs and hard disk drives (HDDs) (where "V2" is the version 2 cube, which follows the optimizations discussed earlier in this paper).

Figure 5: Performance after enabling "multi-threaded mode".
(Note that in Figure 5, the V2 cube performed at the same speed, on SSDs, in all environments. We show only the SQLCAT line for simplicity.)
The conclusion is that by adding enough I/O (through SSDs so that I/O was no longer a bottleneck), we were able to find and resolve the algorithmic issues, enabling incredible performance. The key is that we never would have found these algorithmic issues without first removing the I/O channel bottlenecks by the use of SSDs.

But Wait, Processes Get in the Way!

At this point, due to operational issues, we were initially unable to go live with the SSD servers. It was agreed that given these results, we should go with the easier-to-implement DAS HDD solution, which offered similar performance. Specifically, our challenge to going live was that, due to a lack of SSD enterprise standards as of mid-2009, the supportability story was too complicated to be sustainable across a large number of servers.
In retrospect, not waiting for SSDs was a serious mistake. The cube went live and user satisfaction plummeted. What went wrong? The results shown in Figure 5 were correct, but somehow, performance was awful.

Parallel User Load

Figure 5, though accurate, shows query response times for a single user only. It does show accurately that, with a solidly parallel cube, good DAS can be nearly as fast as SSDs for reasonably large MOLAP cubes. But a more in-depth analysis found that, in our initial analysis, we failed to consider a highly parallel user load and failed to benchmark a large enough multi-user parallel Analysis Services query load.
Had this been done sooner, the patterns illustrated in Figures 6 and 7 would have been found.

Figure 6: Distinct count querying comparison of different access patterns

Figure 7: Distinct count query comparison of different access patterns (time to respond to first query)
To execute the parallelization scenario, the engineering team used a tool to execute multiple Analysis Services queries in parallel, with the following characteristics:

Each query selected a distinct dataset – no overlaps between data.
Each query was run in two concurrency modes: all at the same time, and with 30 seconds between queries, to simulate a real user load.
Three access patterns were selected and timed: Running six queries serially, running six queries concurrently, and running twelve queries concurrently.
Each test run was executed multiple times, clearing caches between runs; the times indicated are averages.
Two measures were recorded: the time for the first submitted query to complete, and the total time for all queries to complete.

After we captured and analyzed this data, our conclusion was clear: Single-user queries benefitted from the use of SSDs. As soon as the number of concurrent business users increased, the benefit of SSD became even more apparent.
From a technical perspective, SSDs allow many more threads of execution to run in parallel without incurring huge I/O wait times, because their random I/O throughput is basically the same as its sequential I/O throughput. This benefit is relevant because multiple independent queries, serviced simultaneously, implicitly cause random I/O at the disk level, and unlike rotating disks, SSD devices do not slow down under random I/O. Though rotating disks slow down to a three-digit number of I/O operations per second when access is random, high-end SSD devices continue to deliver five-digit number of I/O operations per second, sequential or random. This directly translates into more parallel queries, and therefore more concurrent users, per server when its I/O system is based on high end SSD technology.

Recommendations

Here is a list of conclusions we drew from the work we did for the Microsoft Entertainment and Devices Data Warehousing Team. Most have been discussed in this paper, but some just general best practices we want to share with you:

Remove the I/O bottlenecks by adding fast enough underlying disk I/O. Their absence makes it easier to find algorithmic bottlenecks in SQL Server Analysis Services implementations.
The Analysis Services workload is well suited, for distinct counts (and Storage-Engine-heavy query loads), to an SSD I/O backend.
When you evaluate changes to an Analysis Services cube, testing single-user query performance is not enough. If you do not create an independently parallel load, you are not properly simulating usage patterns, because your users are creating parallel loads.
It is critical to be aware of your production workload, to monitor the queries being run and the performance of the system servicing the queries.
Even a simple query can stress Analysis Services distinct count performance – it is critical to consider the size of the dataset that a distinct count query returns to accurately assess the query’s performance.
Follow these rules for making sure Analysis Services can parallelize distinct count queries, in addition to the existing standards and practices for partitioning:
- Make sure that the distinct count attribute is directly used in the partitioning query.
- Make sure that the partitioning query function (for all sub-partitions in a single time period) is continuous. Using a hash function and BETWEEN is one way to do this that works well.
When benchmarking, if you are testing cold-cache scenarios, be sure that you run multiple times and clear all caches between runs. Don’t accept a result as true until you can reproduce it.

Summary

The results of our analysis are clear: SSD technology has significant benefits for MOLAP-based Analysis Services solutions. Because concurrent users implicitly create random I/O patterns, solid-state devices enable greater scalability and per-user performance. In the past, before the advent of SSD technology, getting very high end parallel random I/O performance required a complex and very expensive solution. SSDs offer these benefits without the prohibitively high cost.

http://sqlcat.com/technicalnotes/archive/2010/09/20/analysis-services-distinct-count-optimization-using-solid-state-devices.aspx

Analysis Services Query Performance Top 10 Best Practices

Published: June 7, 2007

Proper cube design, efficient multidimensional expressions (MDX), and sufficient hardware resources are critical to optimal performance of MDX queries issued against a SQL Server 2005 Analysis Services instance. This article lists the ten most common best practices that the Microsoft SQL Server development team recommends with respect to optimizing Analysis Services query performance. For additional discussions about Analysis Services best practices related to query performance, see The Analysis Services Performance Guide and OLAP Design Best Practices for Analysis Services 2005.

Optimize cube and measure group design

Define cascading attribute relationships (for example Day > Month > Quarter > Year) and define user hierarchies of related attributes (called natural hierarchies) within each dimension as appropriate for your data. Attributes participating in natural hierarchies are materialized on disk in hierarchy stores and are automatically considered to be aggregation candidates. User hierarchies are not considered to be natural hierarchies unless the attributes comprising the levels are related through cascading attribute relationships. With SQL Server 2005 Service Pack 2 (SP2), a warning appears in Business Intelligence Development Studio with each user hierarchy that is not defined as a natural hierarchy.
Remove redundant relationships between attributes to assist the query execution engine in generating the appropriate query plan. Attributes need to have either a direct or an indirect relationship to the key attribute, not both.
Keep cube space as small as possible by only including measure groups that are needed.
Place measures that are queried together in the same measure group. A query that retrieves measures from multiple measure groups requires multiple storage engine operations. Consider placing large sets of measures that are not queried together into separate measure groups to optimize cache usage, but do not explode the number of measure groups.
Minimize the use of large parent-child hierarchies. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute (for example, the All attribute) unless it is disabled. As a result, queries returning cells at intermediate levels are calculated at query time and can be slow for large parent-child dimensions. If you are in a design scenario with a large parent-child hierarchy (more than 250,000 members), you may want to consider altering the source schema to reorganize part or all of the hierarchy into a user hierarchy with a fixed number of levels.
Optimize many-to-many dimension performance, if used. When you query the data measure group by the many-to-many dimension, a run-time “join” is performed between the data measure group and the intermediate measure group using the granularity attributes of each dimension that the measure groups have in common. Where possible, reduce the size of the intermediate fact table underlying the intermediate measure group. To optimize the run-time join, review the aggregation design for the intermediate measure group to verify that aggregations include attributes from the many-to-many dimension.

To understand how to optimize dimensions to increase query performance, refer to the articles SQL Server 2005 Analysis Services Performance Guide and OLAP Design Best Practices for Analysis Services 2005. For assistance in analyzing your design for compliance with best practices, see the February 2007 Community Technology Preview (CTP) release of the SQL Server 2005 Best Practices Analyzer (the final version should be released soon).

Define effective aggregations

Define aggregations to reduce the number of records that the storage engine needs to scan from disk to satisfy a query. If SQL Server Profiler traces indicate that most user queries that are not resolved from cache are resolved by partition reads rather than aggregation reads, consider using the Aggregation Manager sample application to design custom aggregations. This sample is available on CodePlex at http://www.codeplex.com/MSFTASProdSamples and a version of this sample that has been updated by the community is available on CodePlex at http://www.codeplex.com/bidshelper.
Avoid designing an excessive number of aggregations. Excessive aggregations reduce processing performance and may reduce query performance. While the optimum number of aggregations varies, the SQL Server Best Practices team’s experience has been that the optimum number is in the tens, not hundreds or thousands in almost all cases.
Enable the Analysis Services query log to capture user query patterns and use this query log when designing aggregations. For more information, see Configuring the Analysis Services Query Log.

To understand how to design aggregations to increase query performance, refer to the articles SQL Server 2005 Analysis Services Performance Guide and OLAP Design Best Practices for Analysis Services 2005.

Use partitions

Define partitions to enable Analysis Services to query less data to resolve a query when it cannot be resolved from the data cache or from aggregations. Also define the partitions to increase parallelism when resolving queries.
For optimum performance, partition data in a manner that matches common queries. A very common choice for partitions is to select an element of time such as day, month, quarter, year or some combination of time elements. Avoid partitioning in a way that requires most queries to be resolved from many partitions.
In most cases, partitions should contain fewer than 20 million records size and each measure group should contain fewer than 2,000 total partitions. Also, avoid defining partitions containing fewer than two million records. Too many partitions causes a slowdown in metadata operations, and too few partitions can result in missed opportunities for parallelism.
Define a separate ROLAP partition for real-time data and place real-time ROLAP partition in its own measure group.

To understand how to design partitions to increase query performance, refer to the SQL Server 2005 Analysis Services Performance Guide, the Microsoft SQL Server Customer Advisory Team blog, and OLAP Design Best Practices for Analysis Services 2005.

Write efficient MDX

Remove empty tuples from your result set to reduce the time spent by the query execution engine serializing the result set.
Avoid run-time checks in an MDX calculation that result in a slow execution path. If you use the Case and IF functions to perform condition checks which must be resolved many times during query resolution, you will have a slow execution path. Rewrite these queries using the SCOPE function to quickly reduce the calculation space to which an MDX calculation refers. For more information, see Budget Variance - A study of MDX optimizations: evaluation modes and NON_EMPTY_BEHAVIOR, Comparing Levels in MDX and CONDITION vs. SCOPE in cell calculations, and Multiselect friendly MDX calculations.
Use Non_Empty_Behavior where possible to enable the query execution engine to use bulk evaluation mode. However, if you use Non_Empty_Behavior incorrectly, you will return incorrect results. For more information, see Budget Variance - A study of MDX optimizations: evaluation modes and NON_EMPTY_BEHAVIOR and Averages, ratios, division by zero and NON_EMPTY_BEHAVIOR.
Use EXISTS rather than filtering on member properties to avoid a slow execution path. Use the NonEmpty and Exists functions to enable the query execution engine to use bulk evaluation mode.
Perform string manipulations within Analysis Services stored procedures using server-side ADOMD.NET rather than with string manipulation functions such as StrToMember and StrToSet.
Rather than using the LookupCube function, use multiple measure groups in the same cube wherever possible.
Rewrite MDX queries containing arbitrary shapes to reduce excessive subqueries where possible. An arbitrary shaped set is a set of members that cannot be resolved as a crossjoin of sets with a single hierarchality. For example, the set {(Gender.Male, Customer.USA), (Gender.Female, Customer.Canada)} is an arbitrary set. You can frequently use the Descendants function to resolve arbitrary shapes by using a smaller number of subqueries than queries that return the same result that are written using other functions.
Rewrite MDX queries that result in excessive prefetching where possible. Prefetching is a term used to describe cases where the query execution engine requests more information from the storage engine than is required to resolve the query at hand for reasons of perceived efficiency. Generally, prefetching is the most efficient data retrieval choice. However, occasionally it is not. In some cases you may be able to eliminate excessive prefetching by rewriting queries with a subselect in the FROM clause rather than a set in the WHERE clause. When you cannot eliminate excessive prefetching, you may need to disable prefetching and warm the cache using the Create Cache statement. For more information, see How to Warm up the Analysis Services data cache using Create Cache statement.
Filter a set before using it in a crossjoin to reduce the cube space before performing the crossjoin.

Use the query engine cache efficiently

Ensure that the Analysis Services computer has sufficient memory to store query results in memory for re-use in resolving subsequent queries. To monitor, use the MSAS 2005: Memory/Cleaner Memory Shrinkable DB and the MSAS 2005: Cache/Evictions/sec Performance Monitor counters.
Define calculations in the MDX script. Calculations in the MDX script have a global scope that enables the cache related to these queries to be shared across sessions for the same set of security permissions. However, calculated members defined using Create Member and With Member within user queries do not have global scope and the cache related to these queries cannot be shared across sessions.
Warm the cache by executing a set of predefined queries using the tool of your choice. You can also use a Create Cache statement for this purpose. For more information on using the Create Cache statement, see How to Warm up the Analysis Services data cache using Create Cache statement. For information on how to use SQL Server 2005 Integration Services to warm the cache, see Build Your Own Analysis Services Cache-Warmer in Integration Services.
Rewrite MDX queries containing arbitrary shapes to optimize caching. For example, in some cases you can rewrite queries that require non-cached disk access such that they can be resolved entirely from cache by using a subselect in the FROM clause rather than a WHERE clause. In other cases, a WHERE clause may be a better choice.

Ensure flexible aggregations are available to answer queries.

Note that incrementally updating a dimension using ProcessUpdate on a dimension drops all flexible aggregations affected by updates and deletes and, by default, does not re-create them until the next full process.
Ensure that aggregations are re-created by processing affected objects, configuring lazy processing, performing ProcessIndexes on affected partitions, or performing full processing on affected partitions.

To understand how to ensure flexible aggregations are not dropped, refer to the SQL Server 2005 Analysis Services Performance Guide.

Tune memory usage

Increase the size of the paging files on the Analysis Services server or add additional memory to prevent out–of-memory errors when the amount of virtual memory allocated exceeds the amount of physical memory on the Analysis Services server.
Use Microsoft Windows Advanced Server® or Datacenter Server with SQL Server 2005 Enterprise Edition (or SQL Server 2005 Developer Edition) when you are using SQL Server 2005 (32-bit) to enable Analysis Services to address up to 3 GB of memory. To enable Analysis Services to address more than 2 GB of physical memory with either of these editions, use the /3GB switch in the boot.ini file. If you set the /3GB switch in the boot.ini file, the server should have at least 4 GB of memory to ensure that the Windows operating system also has sufficient memory for system services.
Reduce the value for the Memory/LowMemoryLimit property below 75 percent when running multiple instances of Analysis Services or when running other applications on the same computer.
Reduce the value for the Memory/TotalMemoryLimit property below 80percent when running multiple instances of Analysis Services or when running other applications on the same computer.
Keep a gap between the Memory/LowMemoryLimit property and the Memory/TotalMemoryLimit property – 20 percent is frequently used.
When query thrashing is detected in a multi-user environment, contact Microsoft Support for assistance in modifying the MemoryHeapType.
When running on non-uniform memory access (NUMA) architecture and VirtualAlloc takes a very long time to return or appears to stop responding, upgrade to SQL Server 2005 SP2 and contact Microsoft Support for assistance with appropriate settings for pre-allocating NUMA memory.

To understand when to consider changing default memory use, refer to the SQL Server 2005 Analysis Services Performance Guide and Microsoft SQL Server Customer Advisory Team blog.

Tune processor usage

To increase parallelism during querying for servers with multiple processors, consider modifying the Threadpool\Query\MaxThreads and Threadpool\Process\MaxThreads options to be a number that depends on the number of server processors.
- A general recommendation is to set the Threadpool\Query\MaxThreads to a value of less than or equal to two times the number of processors on the server. For example, if you have an eight-processor server, the general guideline is to set this value to no more than 16. In practical terms, increasing the Threadpool\Query\MaxThreads option will not significantly increase the performance of a given query. Rather, the benefit of increasing this property is that you can increase the number of queries that can be serviced concurrently.
- A general recommendation is to set the Threadpool\Process\MaxThreads option to a value of less than or equal to 10 times the number of processors on the server. This property controls the number of threads used by the storage engine during querying operations as well as during processing operations. For example, if you have an eight-processor server, the general guideline is setting this value to no more than 80. Note that even though the default value is 64, if you have fewer than eight processors on a given server, you do not need to reduce the default value to throttle parallel operations.
While modifying the Threadpool\Process\MaxThreads and Threadpool\Query\MaxThreads properties can increase parallelism during querying, you must also consider the additional impact of the CoordinatorExecutionMode option. For example, if you have a four-processor server and you accept the default CoordinatorExecutionMode setting of -4, a total of 16 jobs can be executed at one time across all server operations. So if 10 queries are executed in parallel and require a total of 20 jobs, only 16 jobs can launch at a given time (assuming that no processing operations are being performed at that time). When the job threshold has been reached, subsequent jobs wait in a queue until a new job can be created. Therefore, if the number of jobs is the bottleneck to the operation, increasing the thread counts may not necessarily improve overall performance.

Scale up where possible

Use a 64-bit architecture for all large systems.
Add memory and processor resources and upgrade the disk I/O subsystem, to alleviate query performance bottlenecks on a single system.
Avoid linking dimensions or measure groups across servers and avoid remote partitions whenever possible because these solutions do not perform optimally.

Scale out when you can no longer scale up

If your performance bottleneck is processor utilization on a single system as a result of a multi-user query workload, you can increase query performance by using a cluster of Analysis Services servers to service query requests. Requests can be load balanced across two Analysis Services servers, or across a larger number of Analysis Services servers to support a large number of concurrent users (this is called a server farm). Load-balancing clusters generally scale linearly.
When using a cluster of Analysis Services servers to increase query performance, perform processing on a single processing server and then synchronize the processing and the query servers using the XMLA Synchronize statement, copy the database directory using Robocopy or some other file copy utility, or use the high-speed copy facility of SAN storage solutions.

http://technet.microsoft.com/en-us/library/cc966527.aspx

2011년 3월 24일 목요일

To create an Analysis Services project using the Import Analysis Services 9.0 Database template

1. On the File menu, select New and then select Project, or press Ctrl+Shift+N, to display the New Project dialog box, or click New Project on the toolbar.

2. Select Business Intelligence Projects from the Project types pane in the New Project dialog box.

3. Select Import Analysis Services 9.0 Database from the Visual Studio installed templates category of the New Project dialog box.

4. Type the name of the project in the Name text box.

5. Type or select the folder in which to store the files for the project in the Location dropdown list, or click Browse to select a folder.

6. Either select Add to Solution to add the new project to the existing solution in the Solution dropdown list, or select Create new Solution to create a new solution.

If Create new Solution is selected in Solution, select Create directory for solution to create a new folder for the new solution, then type the name of the new solution in Solution Name.

Click OK.

Analysis Service DB로부터 Analysis Service 프로젝트를 생성하는 방법

Generate random number

랜덤번호생성

DECLARE @RangeStart INT

DECLARE @RangeEnd INT

SET @RangeStart = 1000

SET @RangeEnd = 9999

SELECT CAST(FLOOR(RAND(CAST(NEWID() AS VARBINARY))*(@RangeEnd-@RangeStart+1)+@RangeStart) AS INT)

date format style number

Without century (yy)	With century (yyyy)	Standard	Input/Output**
-	0 or 100 (*)	Default	mon dd yyyy hh:miAM (or PM)
1	101	USA	mm/dd/yy
2	102	ANSI	yy.mm.dd
3	103	British/French	dd/mm/yy
4	104	German	dd.mm.yy
5	105	Italian	dd-mm-yy
6	106	-	dd mon yy
7	107	-	mon dd, yy
8	108	-	hh:mm:ss
-	9 or 109 (*)	Default + milliseconds	mon dd yyyy hh:mi:ss:mmmAM (or PM)
10	110	USA	mm-dd-yy
11	111	JAPAN	yy/mm/dd
12	112	ISO	yymmdd
-	13 or 113 (*)	Europe default + milliseconds	dd mon yyyy hh:mm:ss:mmm(24h)
14	114	-	hh:mi:ss:mmm(24h)
-	20 or 120 (*)	ODBC canonical	yyyy-mm-dd hh:mi:ss(24h)
-	21 or 121 (*)	ODBC canonical (with milliseconds)	yyyy-mm-dd hh:mi:ss.mmm(24h)

YYYYMMDD 형태인 112를 많이 쓰지만
format style number에 따라 다양한 포맷으로 출력할 수 있다.

샘플.
100 : 11 12 2007 11:58AM
101 : 11/12/2007
102 : 2007.11.12
103 : 12/11/2007
104 : 12.11.2007
105 : 12-11-2007
106 : 12 11 2007
107 : 11 12, 2007
108 : 11:58:14
109 : 11 12 2007 11:58:14:013AM
110 : 11-12-2007
111 : 2007/11/12
112 : 20071112
113 : 12 11 2007 11:58:14:013
114 : 11:58:14:013
120 : 2007-11-12 11:58:14
121 : 2007-11-12 11:58:14.013

이 블로그 검색