A data cube is a multidimensional representation of aggregate values. Those values which are higher in the hierarchy are extended aggregations of lower hierarchy data. The reason for this organization method is to provide an organizational array of data that can be easily navigated through utilizing functions such as drill down and roll up. Typically it is the users of decision support systems which will see data in the form of a data cube. Although the information is referred to as a "cube" it can easily be represented 2-dimensional, 3 dimensional, or higher level dimensions. An example of this would be the 10 dimensional hypercube utilized by the NASA Ames Research Center supercomputer "Pleiadies" which utilizes a 10 dimensional hypercube to represent data and process teraflop supercomputing queries on specific data sets. Each of the dimensions defined within the cube represent a measure of interest such as the count for a combination of data occurences, and statistical data such as mean, sum, and average of some specific value or attribute of the cube.
There are three methods of retreiving data from the data cube, and users must decide what will be the most effecient method possible to supply data through its decision support systems.
Three methods of data cube solution are as follows:
A. Pre-compute all cells in the cube
B. Pre-compute no cells
C. Pre-compute some of the cells
The major advantage to having the whole cube precomputed is that the queries run on the data cube will result in answers very quickly, but at the price of having to utilize lots of memory resources. The size of a data cube increases exponentially with the number of attributes it contains and linerally with the number of elements within the set. So as a result we can easily start to see the amount of computing resources that must be diverted to computing data cubes in extended dimensions.
Optionally a user could go with the opposite method which is not to compute any of the cells of the data cube in advance which would free up memory resources. The major disadvantage is the the queries which are run on the data system will take much more time to process, and we can expect this time to be relative to the number of dimensions existing within the cube.
Optimally most users of decisions support systems will want to compromise by pre-computing only the cells in the data cube which most of the queries will be run on. This option provides the optimal trade-off scenario which is often referred to within the computing industry as space-time tradeoff. This technique allows the users of such systems to make decisions about resource utilization within an IT infrastructure and how they can be best utilized.
When trying to understand data cubes it is important to understand that they are comprised of two elements: dimensions and measures. Most often data cubes are precomputed especially if they are not comprised of extended dimensions. This information is useful for analyzing sales figure data, demographics, supply chain deviation, and customer demand. In addition to running queries the data in the cube can be analyzed to reveal trends within data or to drill down within data to find more specific answers to questions. This provides users of decision support systems to have accurate trend forecasting, save company resources by becoming more effecient, and understanding overall business operations. Within the scientific community data cubes are often used in the extended dimensional format and requires vast computing resources to come up with simulation data such as the entry of a vehicle into mars atmosphere, or to predict hurrican formations, or model aircraft aerodynamics and make aircraft more streamlined.
Examples of Data Cubes Utilized in Real World Applications:
Automotive Data Integration Data Cube Illustration Example
NASA Ames Research Center Utilizes Data Arithmetic Data Cube for Supercomputer Benchmarking
NASA Spectral Image Data Cubes
External Links.Data Cube