Abstract
This page accompanies the SAC 2018 paper submission. The data, redacted to include information only relevant to this study and remove personally identifying information can be found here shared in compliance to clause “non-identifying anonymous responses […] may be used for research publications and open sharing within the research community” of informed consent.
We filter out participants who do not perform well in the domain and conceptual exercises. DomainTest.Total
is maxium 15 and Concept.Test
is maximum 7. Depending on “cleanness” level we filter out those who are 12 and 4 or below or those who are 14 and 6 and below. Before that we remove those with color blindness.
## myData$Group: Diag.
## [1] 40
## --------------------------------------------------------
## myData$Group: Chart
## [1] 38
## --------------------------------------------------------
## myData$Group: Tree
## [1] 38
The following are Levene Tests for the homogeneity of variance. We generally assume heteroskedacity. This will be corrected in simple effects, through transformations.
## [1] "Rank by Group:"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 9.9142 6.509e-05 ***
## 345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Top by Group:"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 6.5524 0.001611 **
## 345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Time by Group:"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 21.503 1.583e-09 ***
## 345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Confidence by Group:"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 5.9932 0.002763 **
## 345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Rank by Group and Complexity:"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 8 6.7036 3.705e-08 ***
## 339
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Top by Group and Complexity:"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 8 1.4713 0.1664
## 339
## [1] "Time by Group and Complexity:"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 8 4.6895 1.837e-05 ***
## 339
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Confidence by Group and Complexity:"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 8 4.0848 0.0001158 ***
## 339
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
trans <- c(Top = 0.4,Rank = 0.4,Time = 0,Confidence = 3)
myDTr <- transform_Manual(myD,respVars,trans)
The experimental study is a \(3\times 3\) with choice of visualization being the the between subjects factor and complexity is within subjects. All effects below are reported as significant at \(p<.05\).
Talk about the omni-bus tsts
For each level of complexity (simple, medium, complex) we perfom the following tests:
##
## Type II Repeated Measures MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## (Intercept) 1 0.75535 348.88 1 113 < 2.2e-16 ***
## Group 2 0.07069 4.30 2 113 0.015892 *
## Complexity 1 0.72223 145.61 2 112 < 2.2e-16 ***
## Group:Complexity 2 0.12449 3.75 4 226 0.005636 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## **** Welch W Test: W(2,72.71)=5.71, p=0.005
## Unadjusted Contrasts Test Result:
## Contr. :-2Diag. 1Chart 1Tree
## F-Obs :3.057
## F-Crit :5.16 (1/113)
## p.val :0.1662
## -------:
## Contr. :0Diag. -1Chart 1Tree
## F-Obs :5.538
## F-Crit :5.16 (1/113)
## p.val :0.0407
## -------:
##
## Adjusted Contrasts Test Result:
## Contr. :-2Diag. 1Chart 1Tree
## F-Obs :3.096
## F-Crit :3.963 (1/78.603)
## p.val :0.0824
## -------:
## Contr. :0Diag. -1Chart 1Tree
## F-Obs :5.461
## F-Crit :3.996 (1/61.85)
## p.val :0.0227
## -------:
##
## Kruskal-Wallis rank sum test
##
## data: Rank by Group
## Kruskal-Wallis chi-squared = 13.356, df = 2, p-value = 0.001259
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 47.36118 29.49147 TRUE
## Diag.-Tree 18.94452 29.49147 FALSE
## (Intercept) Group Complexity Group:Complexity
## 1 2 1 2
In terms of ranking identification, there are significant (\(p<0.05\)) main effects of the chosen visualization – Kruskal-Wallis \(H(2)=13.36, p = 0.001\) (Welch’s \(W(2,72.71)=5.71, p=0.005\)), meaning that the choice of visualization affects the ability of participants to specify the rankings of optimal solutions. The level of complexity also has a very significant effect \(F(2,112)=145, p<0.01\) on ranking identification. Interestingly, visualization and complexity level seem to have a statistically significant interaction \(F(4,226)=3.75, p<0.01\). This means that the level of complexity seems to affect success rate in different ways for different visualizations.
## ***************************************
## ****** COMPLEXITY LEVEL: Simple *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.7812 0.06621 .
## 113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Kruskal-Wallis rank sum test
##
## data: Rank by Group
## Kruskal-Wallis chi-squared = 10.353, df = 2, p-value = 0.005647
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 15.996053 17.07563 FALSE
## Diag.-Tree 8.135526 17.07563 FALSE
##
## Call:
## lm(formula = Rank ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.34009 -0.22987 0.03744 0.34124 0.67980
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.61790 0.07431 35.227 <2e-16 ***
## GroupChart 0.21924 0.10647 2.059 0.0418 *
## GroupTree -0.11932 0.10647 -1.121 0.2648
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.47 on 113 degrees of freedom
## Multiple R-squared: 0.08246, Adjusted R-squared: 0.06622
## F-statistic: 5.077 on 2 and 113 DF, p-value: 0.007734
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Rank ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 0.2192 0.1065 2.059 0.0764 .
## Tree - Diag. == 0 -0.1193 0.1065 -1.121 0.4299
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## ***************************************
## ****** COMPLEXITY LEVEL: Medium *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.5374 0.08357 .
## 113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Kruskal-Wallis rank sum test
##
## data: Rank by Group
## Kruskal-Wallis chi-squared = 11.895, df = 2, p-value = 0.002612
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 24.44079 17.07563 TRUE
## Diag.-Tree 20.20395 17.07563 TRUE
##
## Call:
## lm(formula = Rank ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.47999 -0.12270 0.09012 0.18871 0.37624
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.76448 0.05346 51.708 <2e-16 ***
## GroupChart 0.17760 0.07660 2.319 0.0222 *
## GroupTree 0.03766 0.07660 0.492 0.6239
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3381 on 113 degrees of freedom
## Multiple R-squared: 0.0497, Adjusted R-squared: 0.03288
## F-statistic: 2.955 on 2 and 113 DF, p-value: 0.05613
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Rank ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 0.17760 0.07660 2.319 0.0415 *
## Tree - Diag. == 0 0.03766 0.07660 0.492 0.8412
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## ***************************************
## ****** COMPLEXITY LEVEL: Complex *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.9907 0.3745
## 113
##
## Kruskal-Wallis rank sum test
##
## data: Rank by Group
## Kruskal-Wallis chi-squared = 15.096, df = 2, p-value = 0.0005271
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 20.63092 17.07563 TRUE
## Diag.-Tree 25.19671 17.07563 TRUE
##
## Call:
## lm(formula = Rank ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.64361 -0.04747 0.08036 0.12547 0.17594
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.00243 0.03283 91.467 <2e-16 ***
## GroupChart 0.09558 0.04703 2.032 0.0445 *
## GroupTree 0.05047 0.04703 1.073 0.2854
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2076 on 113 degrees of freedom
## Multiple R-squared: 0.03534, Adjusted R-squared: 0.01827
## F-statistic: 2.07 on 2 and 113 DF, p-value: 0.1309
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Rank ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 0.09558 0.04703 2.032 0.0811 .
## Tree - Diag. == 0 0.05047 0.04703 1.073 0.4591
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
To further examine these interactions, we follow the simple effects approach we described above in which we fix different levels of the complexity factor and study effects of the visualization factor. Considering simple models only, none of the spatial visualizations (treemap or chart) lead to better performance than the goal diagrams with significance – charts do so with marginal significance \(p=0.07\) (Dunnett post-hoc, Chart - Diag. == 0 comparison for Simple). Moving on to medium-size goal models however, we observe that charts become significantly \(p=0.04\) more effective than diagrams (same Dunnett post-hoc for Medium). In complex models, on the other hand, charts appear to perform better than diagrams, albeit the likelihood this is observed by chance is slightly beyond our \(0.05\) threshold \(p=0.08\) in the Dunnett tests. The 95% family-wise confidence intervals of the Figure above shed more light on these effects. {} and {} represent charts, treemaps and goal diagrams, respectively. We observe that we can be reasonably confident that charts are consistently better than diagrams, while we remain inconclusive for treemaps.
The question of whether complexity level affects negatively success in rank identification has a negative answer. As the Figure below demonstrates, this success increases with complexity for all visualizations, meaning that as participants get more and more familiar with the visualization, model size does not deter them from finding the correct answers.
##
## Type II Repeated Measures MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## (Intercept) 1 0.91818 1268.10 1 113 < 2.2e-16 ***
## Group 2 0.07656 4.68 2 113 0.0111075 *
## Complexity 1 0.45403 46.57 2 112 1.912e-15 ***
## Group:Complexity 2 0.17615 5.46 4 226 0.0003272 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## **** Welch W Test: W(2,73.04)=6.32, p=0.0029
## Unadjusted Contrasts Test Result:
## Contr. :-2Diag. 1Chart 1Tree
## F-Obs :6.602
## F-Crit :5.16 (1/113)
## p.val :0.023
## -------:
## Contr. :0Diag. -1Chart 1Tree
## F-Obs :2.766
## F-Crit :5.16 (1/113)
## p.val :0.1981
## -------:
##
## Adjusted Contrasts Test Result:
## Contr. :-2Diag. 1Chart 1Tree
## F-Obs :6.974
## F-Crit :3.956 (1/82.989)
## p.val :0.0099
## -------:
## Contr. :0Diag. -1Chart 1Tree
## F-Obs :2.611
## F-Crit :3.994 (1/62.572)
## p.val :0.1112
## -------:
##
## Kruskal-Wallis rank sum test
##
## data: Top by Group
## Kruskal-Wallis chi-squared = 19.269, df = 2, p-value = 6.543e-05
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 56.06053 29.49147 TRUE
## Diag.-Tree 35.56930 29.49147 TRUE
If we restrict our focus to comparing how many times the participants’ top response matches that of the evaluation algorithm, we see similar results. There are significant main effects both due to the visualization – Kruskal-Wallis \(H(2)= 19.27, p=0\) (Welch’s \(W(2,73.04)=6.32, p=0.003\)) – and due to the complexity level \(F(2,112) = 46.57, p<0.01\) as well as a significant interaction \(F(4,226) = 5.46,p<0.01\).
## ***************************************
## ****** COMPLEXITY LEVEL: Simple *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.8179 0.06394 .
## 113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Kruskal-Wallis rank sum test
##
## data: Top by Group
## Kruskal-Wallis chi-squared = 10.353, df = 2, p-value = 0.005647
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 15.996053 17.07563 FALSE
## Diag.-Tree 8.135526 17.07563 FALSE
##
## Call:
## lm(formula = Top ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.73686 -0.17486 0.02834 0.25849 0.51395
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.98477 0.05603 35.424 <2e-16 ***
## GroupChart 0.16604 0.08027 2.068 0.0409 *
## GroupTree -0.08943 0.08027 -1.114 0.2676
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3544 on 113 degrees of freedom
## Multiple R-squared: 0.08265, Adjusted R-squared: 0.06642
## F-statistic: 5.091 on 2 and 113 DF, p-value: 0.007642
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Top ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 0.16604 0.08027 2.068 0.0749 .
## Tree - Diag. == 0 -0.08943 0.08027 -1.114 0.4340
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## ***************************************
## ****** COMPLEXITY LEVEL: Medium *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.826 0.1658
## 113
##
## Kruskal-Wallis rank sum test
##
## data: Top by Group
## Kruskal-Wallis chi-squared = 18.272, df = 2, p-value = 0.0001077
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 27.31776 17.07563 TRUE
## Diag.-Tree 28.27829 17.07563 TRUE
##
## Call:
## lm(formula = Top ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.82220 -0.17079 0.08541 0.22927 0.47878
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.81977 0.06113 29.771 < 2e-16 ***
## GroupChart 0.25620 0.08757 2.925 0.00416 **
## GroupTree 0.16093 0.08757 1.838 0.06875 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3866 on 113 degrees of freedom
## Multiple R-squared: 0.07211, Adjusted R-squared: 0.05569
## F-statistic: 4.391 on 2 and 113 DF, p-value: 0.01457
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Top ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 0.25620 0.08757 2.925 0.00801 **
## Tree - Diag. == 0 0.16093 0.08757 1.838 0.12324
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## ***************************************
## ****** COMPLEXITY LEVEL: Complex *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.5121 0.2249
## 113
##
## Kruskal-Wallis rank sum test
##
## data: Top by Group
## Kruskal-Wallis chi-squared = 15.096, df = 2, p-value = 0.0005271
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 20.63092 17.07563 TRUE
## Diag.-Tree 25.19671 17.07563 TRUE
##
## Call:
## lm(formula = Top ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.91318 -0.02263 0.13896 0.33763 0.33763
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.09960 0.08158 25.736 <2e-16 ***
## GroupChart 0.17074 0.11688 1.461 0.147
## GroupTree -0.02793 0.11688 -0.239 0.812
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.516 on 113 degrees of freedom
## Multiple R-squared: 0.02853, Adjusted R-squared: 0.01134
## F-statistic: 1.659 on 2 and 113 DF, p-value: 0.1949
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Top ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 0.17074 0.11688 1.461 0.252
## Tree - Diag. == 0 -0.02793 0.11688 -0.239 0.959
## (Adjusted p values reported -- single-step method)
Moving on to simple effects, in the above Figures, the confidence intervals comparing visualizations for each complexity level can be seen. As with rank identification, for simple and complex models charts appear to be more effective than diagrams to a near-significant level. For medium complexity the effect is statistically significant.
##
## Type II Repeated Measures MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## (Intercept) 1 0.82439 530.47 1 113 < 2.2e-16 ***
## Group 2 0.20535 14.60 2 113 2.290e-06 ***
## Complexity 1 0.17298 11.71 2 112 2.403e-05 ***
## Group:Complexity 2 0.08117 2.39 4 226 0.05173 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## **** Welch W Test: W(2,70.3)=15.36, p=0
## Unadjusted Contrasts Test Result:
## Contr. :-2Diag. 1Chart 1Tree
## F-Obs :24.654
## F-Crit :5.16 (1/113)
## p.val :0
## -------:
## Contr. :0Diag. -1Chart 1Tree
## F-Obs :4.547
## F-Crit :5.16 (1/113)
## p.val :0.0703
## -------:
##
## Adjusted Contrasts Test Result:
## Contr. :-2Diag. 1Chart 1Tree
## F-Obs :19.364
## F-Crit :4.009 (1/57.302)
## p.val :0
## -------:
## Contr. :0Diag. -1Chart 1Tree
## F-Obs :6.535
## F-Crit :3.993 (1/62.97)
## p.val :0.013
## -------:
##
## Kruskal-Wallis rank sum test
##
## data: Time by Group
## Kruskal-Wallis chi-squared = 44.414, df = 2, p-value = 2.268e-10
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 43.37632 29.49147 TRUE
## Diag.-Tree 87.68333 29.49147 TRUE
Response time is measured as the time difference between loading of the screen with the visualization and the question, and the time that the participant clicks to proceed to the next page. We add up the response times of the nine (9) tasks associated with each combination of visualization and complexity level, and perform our analysis using these totals.
Analyzing differences in response time across visualizations we also observe significant main effects due to the chosen visualization \(H(2)=44.41, p \simeq 0\) (Welch’s \(W(2,70.3)=15.36, p \simeq 0\)) and due to complexity level \(F(2,122) = 11.71, p<0.01\) as well as some interaction between the two factors \(F(4,226) = 2.39, p = 0.052\). Confidence intervals per complexity level can be seen in the Figures above. Participants generally respond with treemaps and charts quicker than with goal diagrams, and this can be claimed with statistical significance for treemaps.
## ***************************************
## ****** COMPLEXITY LEVEL: Simple *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.7791 0.4613
## 113
##
## Kruskal-Wallis rank sum test
##
## data: Time by Group
## Kruskal-Wallis chi-squared = 8.6062, df = 2, p-value = 0.01353
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 12.49276 17.07563 FALSE
## Diag.-Tree 22.26908 17.07563 TRUE
##
## Call:
## lm(formula = Time ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77840 -0.17413 -0.02576 0.17574 0.53213
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.45059 0.03918 62.541 < 2e-16 ***
## GroupChart -0.09509 0.05614 -1.694 0.09303 .
## GroupTree -0.16240 0.05614 -2.893 0.00458 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2478 on 113 degrees of freedom
## Multiple R-squared: 0.06976, Adjusted R-squared: 0.0533
## F-statistic: 4.237 on 2 and 113 DF, p-value: 0.01681
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Time ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 -0.09509 0.05614 -1.694 0.16426
## Tree - Diag. == 0 -0.16240 0.05614 -2.893 0.00882 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## ***************************************
## ****** COMPLEXITY LEVEL: Medium *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.7117 0.1852
## 113
##
## Kruskal-Wallis rank sum test
##
## data: Time by Group
## Kruskal-Wallis chi-squared = 17.154, df = 2, p-value = 0.0001883
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 19.18355 17.07563 TRUE
## Diag.-Tree 31.22303 17.07563 TRUE
##
## Call:
## lm(formula = Time ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.75024 -0.13377 0.01302 0.15187 0.58744
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.60764 0.03671 71.025 < 2e-16 ***
## GroupChart -0.13744 0.05260 -2.613 0.0102 *
## GroupTree -0.21785 0.05260 -4.142 6.68e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2322 on 113 degrees of freedom
## Multiple R-squared: 0.1349, Adjusted R-squared: 0.1196
## F-statistic: 8.809 on 2 and 113 DF, p-value: 0.0002785
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Time ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 -0.1374 0.0526 -2.613 0.019369 *
## Tree - Diag. == 0 -0.2178 0.0526 -4.142 0.000132 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## ***************************************
## ****** COMPLEXITY LEVEL: Complex *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.9243 0.1507
## 113
##
## Kruskal-Wallis rank sum test
##
## data: Time by Group
## Kruskal-Wallis chi-squared = 26.501, df = 2, p-value = 1.759e-06
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 14.39737 17.07563 FALSE
## Diag.-Tree 38.87105 17.07563 TRUE
##
## Call:
## lm(formula = Time ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7829 -0.1024 0.0032 0.1446 0.4745
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.58917 0.03594 72.051 < 2e-16 ***
## GroupChart -0.09005 0.05148 -1.749 0.083 .
## GroupTree -0.25441 0.05148 -4.942 2.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2273 on 113 degrees of freedom
## Multiple R-squared: 0.1811, Adjusted R-squared: 0.1666
## F-statistic: 12.49 on 2 and 113 DF, p-value: 1.253e-05
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Time ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 -0.09005 0.05148 -1.749 0.147
## Tree - Diag. == 0 -0.25441 0.05148 -4.942 5.4e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
The figure below shows the effect difference in minutes, averaged for individual tasks. In non-simple cases Treemaps take nearly half as much time as goal diagrams and charts take from 2/3 to 3/4 as much time.
Since there is a main effect of complexity it is sensible to also investigate effect size in terms of number of seconds. The numbers are as follows:
## Complexity Visualization Time Size
## 1 Simple Chart 29.21345 13.33333
## 2 Medium Chart 36.95614 21.00000
## 3 Complex Chart 39.28070 25.33333
## 4 Simple Treemap 24.56140 13.33333
## 5 Medium Treemap 29.83626 21.00000
## 6 Complex Treemap 25.87135 25.33333
## 7 Simple Goal Diagram 37.99722 13.33333
## 8 Medium Goal Diagram 53.80556 21.00000
## 9 Complex Goal Diagram 50.98333 25.33333
Moving from simple to medium, participants spend \(9.6\) seconds more but moving from medium to complex, participants spend \(1.5\) seconds less. It is interesting to see this effect by dividing times with average number of contribution links (i.e. numbers involved).
For simple models, participants spend \(2.29\) seconds per contribution link. Subsequently, we can observe that participants spend about \(0.38\) and \(0.38\) seconds less per contribution link (i.e., resp. \(1.91\) and \(1.53\), which is \(-17\%\) and \(-20\%\) reduction from one to the next) moving from simple to medium and medium to complex models. A simple plot displaying the relationship between number of contribution links and time spend on each can be seen below.
If participants followed precise mathematical procedures for calculating optimal, we could expect the opposite effect: size would actually increase response time at an increased rate. Instead it seems that participants rely more on their intuition without looking at the details. Treemaps appear to be more amenable to such kinds of decision.
##
## Type II Repeated Measures MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## (Intercept) 1 0.73419 312.123 1 113 < 2.2e-16 ***
## Group 2 0.07602 4.648 2 113 0.01148 *
## Complexity 1 0.35260 30.500 2 112 2.665e-11 ***
## Group:Complexity 2 0.06419 1.873 4 226 0.11600
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## **** Welch W Test: W(2,74.74)=4.91, p=0.0099
## Unadjusted Contrasts Test Result:
## Contr. :-2Diag. 1Chart 1Tree
## F-Obs :5.518
## F-Crit :5.16 (1/113)
## p.val :0.0411
## -------:
## Contr. :0Diag. -1Chart 1Tree
## F-Obs :3.779
## F-Crit :5.16 (1/113)
## p.val :0.1088
## -------:
##
## Adjusted Contrasts Test Result:
## Contr. :-2Diag. 1Chart 1Tree
## F-Obs :4.762
## F-Crit :3.989 (1/64.773)
## p.val :0.0327
## -------:
## Contr. :0Diag. -1Chart 1Tree
## F-Obs :4.59
## F-Crit :3.972 (1/73.037)
## p.val :0.0355
## -------:
##
## Kruskal-Wallis rank sum test
##
## data: Confidence by Group
## Kruskal-Wallis chi-squared = 19.925, df = 2, p-value = 4.714e-05
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 55.82368 29.49147 TRUE
## Diag.-Tree 11.89386 29.49147 FALSE
Participants confidence on their answer is acquired through a four-value rating scale. The responses are mapped to the values {-3,-1,+1,+3} which are in turn treated as an interval scale . As in the case of accuracy and time, we sum up the nine (9) values provided for each combination of visualization and complexity level, and perform the analysis with the resulting total.
As with the previous measures, confidence also presents us with significant effects both due to visualization \(H(2)=19.92, p \simeq 0\) (Welch’s \(W(2,74.74)=4.91, p\simeq 0.0099\)) and due to complexity level \(F(2,112) = 30.5, p<0.01\). There is no statistically significant interaction.
## ***************************************
## ****** COMPLEXITY LEVEL: Simple *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.8069 0.06461 .
## 113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Kruskal-Wallis rank sum test
##
## data: Confidence by Group
## Kruskal-Wallis chi-squared = 6.8298, df = 2, p-value = 0.03288
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 16.8559211 17.07563 FALSE
## Diag.-Tree 0.8677632 17.07563 FALSE
##
## Call:
## lm(formula = Confidence ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24640.6 -9500.6 -500.2 8442.5 19167.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27753.9 1890.3 14.682 <2e-16 ***
## GroupChart 6161.0 2708.3 2.275 0.0248 *
## GroupTree -226.7 2708.3 -0.084 0.9334
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11960 on 113 degrees of freedom
## Multiple R-squared: 0.05863, Adjusted R-squared: 0.04197
## F-statistic: 3.519 on 2 and 113 DF, p-value: 0.03292
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Confidence ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 6161.0 2708.3 2.275 0.0461 *
## Tree - Diag. == 0 -226.7 2708.3 -0.084 0.9949
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## ***************************************
## ****** COMPLEXITY LEVEL: Medium *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.7381 0.4803
## 113
##
## Kruskal-Wallis rank sum test
##
## data: Confidence by Group
## Kruskal-Wallis chi-squared = 5.8812, df = 2, p-value = 0.05283
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 17.357237 17.07563 TRUE
## Diag.-Tree 3.133553 17.07563 FALSE
##
## Call:
## lm(formula = Confidence ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26684 -9891 -2369 10725 25699
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20996 2214 9.485 4.74e-16 ***
## GroupChart 7023 3171 2.215 0.0288 *
## GroupTree 1078 3171 0.340 0.7345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14000 on 113 degrees of freedom
## Multiple R-squared: 0.04738, Adjusted R-squared: 0.03052
## F-statistic: 2.81 on 2 and 113 DF, p-value: 0.0644
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Confidence ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 7023 3171 2.215 0.0533 .
## Tree - Diag. == 0 1078 3171 0.340 0.9199
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## ***************************************
## ****** COMPLEXITY LEVEL: Complex *******
## ***************************************
##
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.1773 0.8378
## 113
##
## Kruskal-Wallis rank sum test
##
## data: Confidence by Group
## Kruskal-Wallis chi-squared = 9.7865, df = 2, p-value = 0.007497
##
## Multiple comparison test after Kruskal-Wallis, treatments vs control (two-tailed)
## p.value: 0.05
## Comparisons
## obs.dif critical.dif difference
## Diag.-Chart 23.628289 17.07563 TRUE
## Diag.-Tree 9.378289 17.07563 FALSE
##
## Call:
## lm(formula = Confidence ~ Group, data = dSlice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27235 -11176 -2243 11516 28809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17886 2293 7.802 3.33e-12 ***
## GroupChart 9413 3285 2.866 0.00496 **
## GroupTree 3396 3285 1.034 0.30335
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14500 on 113 degrees of freedom
## Multiple R-squared: 0.06907, Adjusted R-squared: 0.05259
## F-statistic: 4.192 on 2 and 113 DF, p-value: 0.01753
##
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Dunnett Contrasts
##
##
## Fit: lm(formula = Confidence ~ Group, data = dSlice)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chart - Diag. == 0 9413 3285 2.866 0.00954 **
## Tree - Diag. == 0 3396 3285 1.034 0.48397
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
The 95% family-wise confidence intervals are seen in the above figures. We observe that participants are in all cases more confident in their responses with the charts than with the goal diagrams. That cannot be said about treemaps. Finally, as complexity increases, despite the participants getting more familiar with the visualization, the confidence ratings drop (figure below). The differences between visualizations seem to amplify as complexity increases, with goal diagrams performing the worse.