• banner

    “The PCA crew is leagues above
    the competition” - Pepsico

  • "a prior vendor failed to develop
    the application, which (Practical) delivered." - Franklin Park

  • "the application (is) easier to use, with a
    much simpler, more intuitive use interface" - Royal Administration Services

  • "Sometimes it's the things you don't ask for
    that are important" - Royal Administration Services

  • "PCA turned our raw data and algorithms into easy to use
    and understand features and reports." - CareScout

  • "The quality and professionalism ... has been
    so outstanding that we want to send you a letter of thanks." - USALCO

  • "PCA people and applications stand out by
    far exceeding our expectations." - USALCO

  • "The resulting application greatly increas(ed)...
    our business effectiveness and efficiency." - The Advest Group

  • "PCA completes projects
    efficiently and effectively" - Pepsico

Blog

Visible Data Patterns

Sat, April 23, 2011

For 40 years BI propaganda lead us to believe that statistical modeling can solve business problems but truth is it also can hide Data Patterns and mask Business Problems.

Let’s talk about situations when Visulization can show Data Paterns while statistics looks like a big lie even with small datasets!

I also noticed the huge attention paid by Forrester to Advanced Data Visualization and probably for 4  good reasons:

  • data visualization can fit much more (tens of thousands) data points into one screen or page compare with numerical information and datagrid ( hundreds datapoints per screen);
  • ability to visually drilldown and zoom through interactive and synchronized charts;
  • ability to convey a story behind the data to a wider audience through data visualization.
  • analysts and decision makers cannot see patterns (and in many cases also trends and outliers) in data without data visualization, like 37+ years old example, known as Anscombe’s quartet, which comprises four datasets that have identical simple statistical properties, yet appear very different when visualized. They were constructed by F.J. Anscombe to demonstrate the importance of Data Visualization

For all four ( blue, orange, green and yellow) datasets we have these statistical properties:

Property  Value
 Mean of x in each case   9 exact
 Variance of x in each case  10 exact
 Mean of y in each case   7.50 (to 2 d.p.)
 Variance of y in each case   3.75 (to 2 d.p.)
 Correlation between x and y in each case  0.816 (to 3 d.p.)
 Linear regression line in each case  y = 3.00 + 0.500x (to 2 d.p. and 3 d.p. resp.)

But Data Visualization of these 4 datasets shows completely different Data Patterns:

I will leave to your imagination what Business Analysts and Decision Makers can miss if corporate Datasets will have instead of 11 datapoints say 11 millions or 11 billions of datapoints! And guess what: most of modern companies have much more then 11 datapoints to analize and Visualize. PCA is helping such companies every day!

Andrei Pandre, Ph.D., VP of Data Visualization, Practical Computer Applications, Inc.

Comments are closed.

Posts on Practical Blog by Month