Data Set Preparation

Filtering of Cases

Data Characteristics

Number of Cases
Homogeneity of Variance

Transformation

Possible Transformations

Analysis Overview

General
Omnni-Bus
Simple Effects for Interactions

Rank

Omnibus Tests

Standard Multi-Variate ANOVA
Welch’s W Test - Omni
Welch’s W Test - Contrasts
Kruskal Wallis - Omni
Kruskal Wallis - Posthocs
Summary

Interactions / Simple Effects

Kruskal and Dunnett Comparisons
Summary
Interaction Diagrams

Omnibus Tests

Standard Multi-Variate ANOVA
Welch’s W Test - Omni
Welch’s W Test - Contrasts
Kruskal Wallis - Omni
Kruskal Wallis - Posthocs
Summary

Interactions / Simple Effects

Kruskal and Dunnett Comparisons
Summary
Interaction Diagrams

Time

Omnibus Tests

Standard Multi-Variate ANOVA
Welch’s W Test - Omni
Welch’s W Test - Contrasts
Kruskal Wallis - Omni
Kruskal Wallis - Posthocs
Summary

Interactions / Simple Effects

Kruskal and Dunnett Comparisons
Interaction Diagrams

The Complexity Effect

Confidence

Omnibus Tests

Standard Multi-Variate ANOVA
Welch’s W Test - Omni
Welch’s W Test - Contrasts
Kruskal Wallis - Omni
Kruskal Wallis - Posthocs
Summary

Interactions / Simple Effects

Kruskal and Dunnett Comparisons
Interaction Diagrams

Visualizing Decision Problems within Goal Models: an Exploratory Experiment (SAC18)

Sotirios Liaskos, Teodora Dundjerovic, Grace Gabriel

14 December, 2017

Abstract

This page accompanies the SAC 2018 paper submission. The data, redacted to include information only relevant to this study and remove personally identifying information can be found here shared in compliance to clause “non-identifying anonymous responses […] may be used for research publications and open sharing within the research community” of informed consent.

Data Set Preparation

Filtering of Cases

We filter out participants who do not perform well in the domain and conceptual exercises. DomainTest.Total is maxium 15 and Concept.Test is maximum 7. Depending on “cleanness” level we filter out those who are 12 and 4 or below or those who are 14 and 6 and below. Before that we remove those with color blindness.

Data Characteristics

Number of Cases

## myData$Group: Diag.
## [1] 40
## -------------------------------------------------------- 
## myData$Group: Chart
## [1] 38
## -------------------------------------------------------- 
## myData$Group: Tree
## [1] 38

Homogeneity of Variance

The following are Levene Tests for the homogeneity of variance. We generally assume heteroskedacity. This will be corrected in simple effects, through transformations.

## [1] "Rank by Group:"

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   2  9.9142 6.509e-05 ***
##       345                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## [1] "Top by Group:"

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value   Pr(>F)   
## group   2  6.5524 0.001611 **
##       345                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## [1] "Time by Group:"

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   2  21.503 1.583e-09 ***
##       345                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## [1] "Confidence by Group:"

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value   Pr(>F)   
## group   2  5.9932 0.002763 **
##       345                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## [1] "Rank by Group and Complexity:"

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   8  6.7036 3.705e-08 ***
##       339                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## [1] "Top by Group and Complexity:"

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   8  1.4713 0.1664
##       339

## [1] "Time by Group and Complexity:"

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   8  4.6895 1.837e-05 ***
##       339                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## [1] "Confidence by Group and Complexity:"

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   8  4.0848 0.0001158 ***
##       339                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Transformation

Possible Transformations

trans <- c(Top = 0.4,Rank = 0.4,Time = 0,Confidence = 3)
myDTr <- transform_Manual(myD,respVars,trans)

Analysis Overview

General

The experimental study is a $3\times 3$ with choice of visualization being the the between subjects factor and complexity is within subjects. All effects below are reported as significant at $p<.05$ .

Omnni-Bus

Talk about the omni-bus tsts

Simple Effects for Interactions

For each level of complexity (simple, medium, complex) we perfom the following tests:

Omni-bus between subjects for the particular complexity level.

Standard parametric.
Kruskal-Wallis

Contrasts (each spatial against diagrams)

Standard parametric and confidence intervals (Dunnett).
Kruskal-Wallis.

Rank

Omnibus Tests

Standard Multi-Variate ANOVA

## 
## Type II Repeated Measures MANOVA Tests: Pillai test statistic
##                  Df test stat approx F num Df den Df    Pr(>F)    
## (Intercept)       1   0.75535   348.88      1    113 < 2.2e-16 ***
## Group             2   0.07069     4.30      2    113  0.015892 *  
## Complexity        1   0.72223   145.61      2    112 < 2.2e-16 ***
## Group:Complexity  2   0.12449     3.75      4    226  0.005636 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Welch’s W Test - Omni

## **** Welch W Test:   W(2,72.71)=5.71, p=0.005

Welch’s W Test - Contrasts

## Unadjusted Contrasts Test Result:
##   Contr. :-2Diag. 1Chart 1Tree
##   F-Obs  :3.057
##   F-Crit :5.16 (1/113)
##   p.val  :0.1662
##   -------:
##   Contr. :0Diag. -1Chart 1Tree
##   F-Obs  :5.538
##   F-Crit :5.16 (1/113)
##   p.val  :0.0407
##   -------:
## 
## Adjusted Contrasts Test Result:
##   Contr. :-2Diag. 1Chart 1Tree
##   F-Obs  :3.096
##   F-Crit :3.963 (1/78.603)
##   p.val  :0.0824
##   -------:
##   Contr. :0Diag. -1Chart 1Tree
##   F-Obs  :5.461
##   F-Crit :3.996 (1/61.85)
##   p.val  :0.0227
##   -------:

Kruskal Wallis - Omni

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Rank by Group
## Kruskal-Wallis chi-squared = 13.356, df = 2, p-value = 0.001259

Kruskal Wallis - Posthocs

## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 47.36118     29.49147       TRUE
## Diag.-Tree  18.94452     29.49147      FALSE

Summary

##      (Intercept)            Group       Complexity Group:Complexity 
##                1                2                1                2

In terms of ranking identification, there are significant ( $p<0.05$ ) main effects of the chosen visualization – Kruskal-Wallis $H(2)=13.36, p = 0.001$ (Welch’s $W(2,72.71)=5.71, p=0.005$ ), meaning that the choice of visualization affects the ability of participants to specify the rankings of optimal solutions. The level of complexity also has a very significant effect $F(2,112)=145, p<0.01$ on ranking identification. Interestingly, visualization and complexity level seem to have a statistically significant interaction $F(4,226)=3.75, p<0.01$ . This means that the level of complexity seems to affect success rate in different ways for different visualizations.

Interactions / Simple Effects

Kruskal and Dunnett Comparisons

##  ***************************************
##  ****** COMPLEXITY LEVEL: Simple *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   2  2.7812 0.06621 .
##       113                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Rank by Group
## Kruskal-Wallis chi-squared = 10.353, df = 2, p-value = 0.005647
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##               obs.dif critical.dif difference
## Diag.-Chart 15.996053     17.07563      FALSE
## Diag.-Tree   8.135526     17.07563      FALSE
## 
## Call:
## lm(formula = Rank ~ Group, data = dSlice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.34009 -0.22987  0.03744  0.34124  0.67980 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.61790    0.07431  35.227   <2e-16 ***
## GroupChart   0.21924    0.10647   2.059   0.0418 *  
## GroupTree   -0.11932    0.10647  -1.121   0.2648    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.47 on 113 degrees of freedom
## Multiple R-squared:  0.08246,    Adjusted R-squared:  0.06622 
## F-statistic: 5.077 on 2 and 113 DF,  p-value: 0.007734
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Rank ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)  
## Chart - Diag. == 0   0.2192     0.1065   2.059   0.0764 .
## Tree - Diag. == 0   -0.1193     0.1065  -1.121   0.4299  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

## 
## 
##  ***************************************
##  ****** COMPLEXITY LEVEL: Medium *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   2  2.5374 0.08357 .
##       113                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Rank by Group
## Kruskal-Wallis chi-squared = 11.895, df = 2, p-value = 0.002612
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 24.44079     17.07563       TRUE
## Diag.-Tree  20.20395     17.07563       TRUE
## 
## Call:
## lm(formula = Rank ~ Group, data = dSlice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.47999 -0.12270  0.09012  0.18871  0.37624 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.76448    0.05346  51.708   <2e-16 ***
## GroupChart   0.17760    0.07660   2.319   0.0222 *  
## GroupTree    0.03766    0.07660   0.492   0.6239    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3381 on 113 degrees of freedom
## Multiple R-squared:  0.0497, Adjusted R-squared:  0.03288 
## F-statistic: 2.955 on 2 and 113 DF,  p-value: 0.05613
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Rank ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)  
## Chart - Diag. == 0  0.17760    0.07660   2.319   0.0415 *
## Tree - Diag. == 0   0.03766    0.07660   0.492   0.8412  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

## 
## 
##  ***************************************
##  ****** COMPLEXITY LEVEL: Complex *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  0.9907 0.3745
##       113               
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Rank by Group
## Kruskal-Wallis chi-squared = 15.096, df = 2, p-value = 0.0005271
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 20.63092     17.07563       TRUE
## Diag.-Tree  25.19671     17.07563       TRUE
## 
## Call:
## lm(formula = Rank ~ Group, data = dSlice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.64361 -0.04747  0.08036  0.12547  0.17594 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.00243    0.03283  91.467   <2e-16 ***
## GroupChart   0.09558    0.04703   2.032   0.0445 *  
## GroupTree    0.05047    0.04703   1.073   0.2854    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2076 on 113 degrees of freedom
## Multiple R-squared:  0.03534,    Adjusted R-squared:  0.01827 
## F-statistic:  2.07 on 2 and 113 DF,  p-value: 0.1309
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Rank ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)  
## Chart - Diag. == 0  0.09558    0.04703   2.032   0.0811 .
## Tree - Diag. == 0   0.05047    0.04703   1.073   0.4591  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Summary

To further examine these interactions, we follow the simple effects approach we described above in which we fix different levels of the complexity factor and study effects of the visualization factor. Considering simple models only, none of the spatial visualizations (treemap or chart) lead to better performance than the goal diagrams with significance – charts do so with marginal significance $p=0.07$ (Dunnett post-hoc, Chart - Diag. == 0 comparison for Simple). Moving on to medium-size goal models however, we observe that charts become significantly $p=0.04$ more effective than diagrams (same Dunnett post-hoc for Medium). In complex models, on the other hand, charts appear to perform better than diagrams, albeit the likelihood this is observed by chance is slightly beyond our $0.05$ threshold $p=0.08$ in the Dunnett tests. The 95% family-wise confidence intervals of the Figure above shed more light on these effects. {} and {} represent charts, treemaps and goal diagrams, respectively. We observe that we can be reasonably confident that charts are consistently better than diagrams, while we remain inconclusive for treemaps.

Interaction Diagrams

The question of whether complexity level affects negatively success in rank identification has a negative answer. As the Figure below demonstrates, this success increases with complexity for all visualizations, meaning that as participants get more and more familiar with the visualization, model size does not deter them from finding the correct answers.

Top

Omnibus Tests

Standard Multi-Variate ANOVA

## 
## Type II Repeated Measures MANOVA Tests: Pillai test statistic
##                  Df test stat approx F num Df den Df    Pr(>F)    
## (Intercept)       1   0.91818  1268.10      1    113 < 2.2e-16 ***
## Group             2   0.07656     4.68      2    113 0.0111075 *  
## Complexity        1   0.45403    46.57      2    112 1.912e-15 ***
## Group:Complexity  2   0.17615     5.46      4    226 0.0003272 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Welch’s W Test - Omni

## **** Welch W Test:   W(2,73.04)=6.32, p=0.0029

Welch’s W Test - Contrasts

## Unadjusted Contrasts Test Result:
##   Contr. :-2Diag. 1Chart 1Tree
##   F-Obs  :6.602
##   F-Crit :5.16 (1/113)
##   p.val  :0.023
##   -------:
##   Contr. :0Diag. -1Chart 1Tree
##   F-Obs  :2.766
##   F-Crit :5.16 (1/113)
##   p.val  :0.1981
##   -------:
## 
## Adjusted Contrasts Test Result:
##   Contr. :-2Diag. 1Chart 1Tree
##   F-Obs  :6.974
##   F-Crit :3.956 (1/82.989)
##   p.val  :0.0099
##   -------:
##   Contr. :0Diag. -1Chart 1Tree
##   F-Obs  :2.611
##   F-Crit :3.994 (1/62.572)
##   p.val  :0.1112
##   -------:

Kruskal Wallis - Omni

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Top by Group
## Kruskal-Wallis chi-squared = 19.269, df = 2, p-value = 6.543e-05

Kruskal Wallis - Posthocs

## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 56.06053     29.49147       TRUE
## Diag.-Tree  35.56930     29.49147       TRUE

Summary

If we restrict our focus to comparing how many times the participants’ top response matches that of the evaluation algorithm, we see similar results. There are significant main effects both due to the visualization – Kruskal-Wallis $H(2)= 19.27, p=0$ (Welch’s $W(2,73.04)=6.32, p=0.003$ ) – and due to the complexity level $F(2,112) = 46.57, p<0.01$ as well as a significant interaction $F(4,226) = 5.46,p<0.01$ .

Interactions / Simple Effects

Kruskal and Dunnett Comparisons

##  ***************************************
##  ****** COMPLEXITY LEVEL: Simple *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   2  2.8179 0.06394 .
##       113                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Top by Group
## Kruskal-Wallis chi-squared = 10.353, df = 2, p-value = 0.005647
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##               obs.dif critical.dif difference
## Diag.-Chart 15.996053     17.07563      FALSE
## Diag.-Tree   8.135526     17.07563      FALSE
## 
## Call:
## lm(formula = Top ~ Group, data = dSlice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.73686 -0.17486  0.02834  0.25849  0.51395 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.98477    0.05603  35.424   <2e-16 ***
## GroupChart   0.16604    0.08027   2.068   0.0409 *  
## GroupTree   -0.08943    0.08027  -1.114   0.2676    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3544 on 113 degrees of freedom
## Multiple R-squared:  0.08265,    Adjusted R-squared:  0.06642 
## F-statistic: 5.091 on 2 and 113 DF,  p-value: 0.007642
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Top ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)  
## Chart - Diag. == 0  0.16604    0.08027   2.068   0.0749 .
## Tree - Diag. == 0  -0.08943    0.08027  -1.114   0.4340  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

## 
## 
##  ***************************************
##  ****** COMPLEXITY LEVEL: Medium *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2   1.826 0.1658
##       113               
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Top by Group
## Kruskal-Wallis chi-squared = 18.272, df = 2, p-value = 0.0001077
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 27.31776     17.07563       TRUE
## Diag.-Tree  28.27829     17.07563       TRUE
## 
## Call:
## lm(formula = Top ~ Group, data = dSlice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82220 -0.17079  0.08541  0.22927  0.47878 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.81977    0.06113  29.771  < 2e-16 ***
## GroupChart   0.25620    0.08757   2.925  0.00416 ** 
## GroupTree    0.16093    0.08757   1.838  0.06875 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3866 on 113 degrees of freedom
## Multiple R-squared:  0.07211,    Adjusted R-squared:  0.05569 
## F-statistic: 4.391 on 2 and 113 DF,  p-value: 0.01457
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Top ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)   
## Chart - Diag. == 0  0.25620    0.08757   2.925  0.00801 **
## Tree - Diag. == 0   0.16093    0.08757   1.838  0.12324   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

## 
## 
##  ***************************************
##  ****** COMPLEXITY LEVEL: Complex *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  1.5121 0.2249
##       113               
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Top by Group
## Kruskal-Wallis chi-squared = 15.096, df = 2, p-value = 0.0005271
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 20.63092     17.07563       TRUE
## Diag.-Tree  25.19671     17.07563       TRUE
## 
## Call:
## lm(formula = Top ~ Group, data = dSlice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.91318 -0.02263  0.13896  0.33763  0.33763 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.09960    0.08158  25.736   <2e-16 ***
## GroupChart   0.17074    0.11688   1.461    0.147    
## GroupTree   -0.02793    0.11688  -0.239    0.812    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.516 on 113 degrees of freedom
## Multiple R-squared:  0.02853,    Adjusted R-squared:  0.01134 
## F-statistic: 1.659 on 2 and 113 DF,  p-value: 0.1949
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Top ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0  0.17074    0.11688   1.461    0.252
## Tree - Diag. == 0  -0.02793    0.11688  -0.239    0.959
## (Adjusted p values reported -- single-step method)

Summary

Moving on to simple effects, in the above Figures, the confidence intervals comparing visualizations for each complexity level can be seen. As with rank identification, for simple and complex models charts appear to be more effective than diagrams to a near-significant level. For medium complexity the effect is statistically significant.

Interaction Diagrams

Time

Omnibus Tests

Standard Multi-Variate ANOVA

## 
## Type II Repeated Measures MANOVA Tests: Pillai test statistic
##                  Df test stat approx F num Df den Df    Pr(>F)    
## (Intercept)       1   0.82439   530.47      1    113 < 2.2e-16 ***
## Group             2   0.20535    14.60      2    113 2.290e-06 ***
## Complexity        1   0.17298    11.71      2    112 2.403e-05 ***
## Group:Complexity  2   0.08117     2.39      4    226   0.05173 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Welch’s W Test - Omni

## **** Welch W Test:   W(2,70.3)=15.36, p=0

Welch’s W Test - Contrasts

## Unadjusted Contrasts Test Result:
##   Contr. :-2Diag. 1Chart 1Tree
##   F-Obs  :24.654
##   F-Crit :5.16 (1/113)
##   p.val  :0
##   -------:
##   Contr. :0Diag. -1Chart 1Tree
##   F-Obs  :4.547
##   F-Crit :5.16 (1/113)
##   p.val  :0.0703
##   -------:
## 
## Adjusted Contrasts Test Result:
##   Contr. :-2Diag. 1Chart 1Tree
##   F-Obs  :19.364
##   F-Crit :4.009 (1/57.302)
##   p.val  :0
##   -------:
##   Contr. :0Diag. -1Chart 1Tree
##   F-Obs  :6.535
##   F-Crit :3.993 (1/62.97)
##   p.val  :0.013
##   -------:

Kruskal Wallis - Omni

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Time by Group
## Kruskal-Wallis chi-squared = 44.414, df = 2, p-value = 2.268e-10

Kruskal Wallis - Posthocs

## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 43.37632     29.49147       TRUE
## Diag.-Tree  87.68333     29.49147       TRUE

Summary

Response time is measured as the time difference between loading of the screen with the visualization and the question, and the time that the participant clicks to proceed to the next page. We add up the response times of the nine (9) tasks associated with each combination of visualization and complexity level, and perform our analysis using these totals.

Analyzing differences in response time across visualizations we also observe significant main effects due to the chosen visualization $H(2)=44.41, p \simeq 0$ (Welch’s $W(2,70.3)=15.36, p \simeq 0$ ) and due to complexity level $F(2,122) = 11.71, p<0.01$ as well as some interaction between the two factors $F(4,226) = 2.39, p = 0.052$ . Confidence intervals per complexity level can be seen in the Figures above. Participants generally respond with treemaps and charts quicker than with goal diagrams, and this can be claimed with statistical significance for treemaps.

Interactions / Simple Effects

Kruskal and Dunnett Comparisons

##  ***************************************
##  ****** COMPLEXITY LEVEL: Simple *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  0.7791 0.4613
##       113               
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Time by Group
## Kruskal-Wallis chi-squared = 8.6062, df = 2, p-value = 0.01353
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 12.49276     17.07563      FALSE
## Diag.-Tree  22.26908     17.07563       TRUE
## 
## Call:
## lm(formula = Time ~ Group, data = dSlice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77840 -0.17413 -0.02576  0.17574  0.53213 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.45059    0.03918  62.541  < 2e-16 ***
## GroupChart  -0.09509    0.05614  -1.694  0.09303 .  
## GroupTree   -0.16240    0.05614  -2.893  0.00458 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2478 on 113 degrees of freedom
## Multiple R-squared:  0.06976,    Adjusted R-squared:  0.0533 
## F-statistic: 4.237 on 2 and 113 DF,  p-value: 0.01681
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Time ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)   
## Chart - Diag. == 0 -0.09509    0.05614  -1.694  0.16426   
## Tree - Diag. == 0  -0.16240    0.05614  -2.893  0.00882 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

## 
## 
##  ***************************************
##  ****** COMPLEXITY LEVEL: Medium *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  1.7117 0.1852
##       113               
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Time by Group
## Kruskal-Wallis chi-squared = 17.154, df = 2, p-value = 0.0001883
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 19.18355     17.07563       TRUE
## Diag.-Tree  31.22303     17.07563       TRUE
## 
## Call:
## lm(formula = Time ~ Group, data = dSlice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.75024 -0.13377  0.01302  0.15187  0.58744 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.60764    0.03671  71.025  < 2e-16 ***
## GroupChart  -0.13744    0.05260  -2.613   0.0102 *  
## GroupTree   -0.21785    0.05260  -4.142 6.68e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2322 on 113 degrees of freedom
## Multiple R-squared:  0.1349, Adjusted R-squared:  0.1196 
## F-statistic: 8.809 on 2 and 113 DF,  p-value: 0.0002785
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Time ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)    
## Chart - Diag. == 0  -0.1374     0.0526  -2.613 0.019369 *  
## Tree - Diag. == 0   -0.2178     0.0526  -4.142 0.000132 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

## 
## 
##  ***************************************
##  ****** COMPLEXITY LEVEL: Complex *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  1.9243 0.1507
##       113               
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Time by Group
## Kruskal-Wallis chi-squared = 26.501, df = 2, p-value = 1.759e-06
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 14.39737     17.07563      FALSE
## Diag.-Tree  38.87105     17.07563       TRUE
## 
## Call:
## lm(formula = Time ~ Group, data = dSlice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7829 -0.1024  0.0032  0.1446  0.4745 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.58917    0.03594  72.051  < 2e-16 ***
## GroupChart  -0.09005    0.05148  -1.749    0.083 .  
## GroupTree   -0.25441    0.05148  -4.942 2.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2273 on 113 degrees of freedom
## Multiple R-squared:  0.1811, Adjusted R-squared:  0.1666 
## F-statistic: 12.49 on 2 and 113 DF,  p-value: 1.253e-05
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Time ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)    
## Chart - Diag. == 0 -0.09005    0.05148  -1.749    0.147    
## Tree - Diag. == 0  -0.25441    0.05148  -4.942  5.4e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Interaction Diagrams

The figure below shows the effect difference in minutes, averaged for individual tasks. In non-simple cases Treemaps take nearly half as much time as goal diagrams and charts take from 2/3 to 3/4 as much time.

The Complexity Effect

Since there is a main effect of complexity it is sensible to also investigate effect size in terms of number of seconds. The numbers are as follows:

##   Complexity Visualization     Time     Size
## 1     Simple         Chart 29.21345 13.33333
## 2     Medium         Chart 36.95614 21.00000
## 3    Complex         Chart 39.28070 25.33333
## 4     Simple       Treemap 24.56140 13.33333
## 5     Medium       Treemap 29.83626 21.00000
## 6    Complex       Treemap 25.87135 25.33333
## 7     Simple  Goal Diagram 37.99722 13.33333
## 8     Medium  Goal Diagram 53.80556 21.00000
## 9    Complex  Goal Diagram 50.98333 25.33333

Moving from simple to medium, participants spend $9.6$ seconds more but moving from medium to complex, participants spend $1.5$ seconds less. It is interesting to see this effect by dividing times with average number of contribution links (i.e. numbers involved).

For simple models, participants spend $2.29$ seconds per contribution link. Subsequently, we can observe that participants spend about $0.38$ and $0.38$ seconds less per contribution link (i.e., resp. $1.91$ and $1.53$ , which is $-17\%$ and $-20\%$ reduction from one to the next) moving from simple to medium and medium to complex models. A simple plot displaying the relationship between number of contribution links and time spend on each can be seen below.

If participants followed precise mathematical procedures for calculating optimal, we could expect the opposite effect: size would actually increase response time at an increased rate. Instead it seems that participants rely more on their intuition without looking at the details. Treemaps appear to be more amenable to such kinds of decision.

Confidence

Omnibus Tests

Standard Multi-Variate ANOVA

## 
## Type II Repeated Measures MANOVA Tests: Pillai test statistic
##                  Df test stat approx F num Df den Df    Pr(>F)    
## (Intercept)       1   0.73419  312.123      1    113 < 2.2e-16 ***
## Group             2   0.07602    4.648      2    113   0.01148 *  
## Complexity        1   0.35260   30.500      2    112 2.665e-11 ***
## Group:Complexity  2   0.06419    1.873      4    226   0.11600    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Welch’s W Test - Omni

## **** Welch W Test:   W(2,74.74)=4.91, p=0.0099

Welch’s W Test - Contrasts

## Unadjusted Contrasts Test Result:
##   Contr. :-2Diag. 1Chart 1Tree
##   F-Obs  :5.518
##   F-Crit :5.16 (1/113)
##   p.val  :0.0411
##   -------:
##   Contr. :0Diag. -1Chart 1Tree
##   F-Obs  :3.779
##   F-Crit :5.16 (1/113)
##   p.val  :0.1088
##   -------:
## 
## Adjusted Contrasts Test Result:
##   Contr. :-2Diag. 1Chart 1Tree
##   F-Obs  :4.762
##   F-Crit :3.989 (1/64.773)
##   p.val  :0.0327
##   -------:
##   Contr. :0Diag. -1Chart 1Tree
##   F-Obs  :4.59
##   F-Crit :3.972 (1/73.037)
##   p.val  :0.0355
##   -------:

Kruskal Wallis - Omni

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Confidence by Group
## Kruskal-Wallis chi-squared = 19.925, df = 2, p-value = 4.714e-05

Kruskal Wallis - Posthocs

## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##              obs.dif critical.dif difference
## Diag.-Chart 55.82368     29.49147       TRUE
## Diag.-Tree  11.89386     29.49147      FALSE

Summary

Participants confidence on their answer is acquired through a four-value rating scale. The responses are mapped to the values {-3,-1,+1,+3} which are in turn treated as an interval scale . As in the case of accuracy and time, we sum up the nine (9) values provided for each combination of visualization and complexity level, and perform the analysis with the resulting total.

As with the previous measures, confidence also presents us with significant effects both due to visualization $H(2)=19.92, p \simeq 0$ (Welch’s $W(2,74.74)=4.91, p\simeq 0.0099$ ) and due to complexity level $F(2,112) = 30.5, p<0.01$ . There is no statistically significant interaction.

Interactions / Simple Effects

Kruskal and Dunnett Comparisons

##  ***************************************
##  ****** COMPLEXITY LEVEL: Simple *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   2  2.8069 0.06461 .
##       113                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Confidence by Group
## Kruskal-Wallis chi-squared = 6.8298, df = 2, p-value = 0.03288
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##                obs.dif critical.dif difference
## Diag.-Chart 16.8559211     17.07563      FALSE
## Diag.-Tree   0.8677632     17.07563      FALSE
## 
## Call:
## lm(formula = Confidence ~ Group, data = dSlice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24640.6  -9500.6   -500.2   8442.5  19167.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  27753.9     1890.3  14.682   <2e-16 ***
## GroupChart    6161.0     2708.3   2.275   0.0248 *  
## GroupTree     -226.7     2708.3  -0.084   0.9334    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11960 on 113 degrees of freedom
## Multiple R-squared:  0.05863,    Adjusted R-squared:  0.04197 
## F-statistic: 3.519 on 2 and 113 DF,  p-value: 0.03292
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Confidence ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)  
## Chart - Diag. == 0   6161.0     2708.3   2.275   0.0461 *
## Tree - Diag. == 0    -226.7     2708.3  -0.084   0.9949  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

## 
## 
##  ***************************************
##  ****** COMPLEXITY LEVEL: Medium *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  0.7381 0.4803
##       113               
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Confidence by Group
## Kruskal-Wallis chi-squared = 5.8812, df = 2, p-value = 0.05283
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##               obs.dif critical.dif difference
## Diag.-Chart 17.357237     17.07563       TRUE
## Diag.-Tree   3.133553     17.07563      FALSE
## 
## Call:
## lm(formula = Confidence ~ Group, data = dSlice)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -26684  -9891  -2369  10725  25699 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    20996       2214   9.485 4.74e-16 ***
## GroupChart      7023       3171   2.215   0.0288 *  
## GroupTree       1078       3171   0.340   0.7345    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14000 on 113 degrees of freedom
## Multiple R-squared:  0.04738,    Adjusted R-squared:  0.03052 
## F-statistic:  2.81 on 2 and 113 DF,  p-value: 0.0644
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Confidence ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)  
## Chart - Diag. == 0     7023       3171   2.215   0.0533 .
## Tree - Diag. == 0      1078       3171   0.340   0.9199  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

## 
## 
##  ***************************************
##  ****** COMPLEXITY LEVEL: Complex *******
##  ***************************************
## 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  0.1773 0.8378
##       113               
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Confidence by Group
## Kruskal-Wallis chi-squared = 9.7865, df = 2, p-value = 0.007497
## 
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed) 
## p.value: 0.05 
## Comparisons
##               obs.dif critical.dif difference
## Diag.-Chart 23.628289     17.07563       TRUE
## Diag.-Tree   9.378289     17.07563      FALSE
## 
## Call:
## lm(formula = Confidence ~ Group, data = dSlice)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -27235 -11176  -2243  11516  28809 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    17886       2293   7.802 3.33e-12 ***
## GroupChart      9413       3285   2.866  0.00496 ** 
## GroupTree       3396       3285   1.034  0.30335    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14500 on 113 degrees of freedom
## Multiple R-squared:  0.06907,    Adjusted R-squared:  0.05259 
## F-statistic: 4.192 on 2 and 113 DF,  p-value: 0.01753
## 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Dunnett Contrasts
## 
## 
## Fit: lm(formula = Confidence ~ Group, data = dSlice)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)   
## Chart - Diag. == 0     9413       3285   2.866  0.00954 **
## Tree - Diag. == 0      3396       3285   1.034  0.48397   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Interaction Diagrams

The 95% family-wise confidence intervals are seen in the above figures. We observe that participants are in all cases more confident in their responses with the charts than with the goal diagrams. That cannot be said about treemaps. Finally, as complexity increases, despite the participants getting more familiar with the visualization, the confidence ratings drop (figure below). The differences between visualizations seem to amplify as complexity increases, with goal diagrams performing the worse.