Abstract
This page accompanies the ESEM 2017 short paper submission. A version including the R scripts can be found here. The data, redacted to include information only relevant to this study and remove personally identifying information can be found here and here, published here in compliance to clause “non-identifying responses and aggregations thereof will appear in the PI’s research report and potential publications of this research” of informed consent.
A total of 41 persons showed up for the experiment, 21 assigned to the symbolic instrument and 20 to the textual. Out of these 5 are filtered out due to low performance in questions testing their comprehension of concepts presented in the videos and 1 due to an incomplete response. The remaining 35 cases, , 18 in the symbolic and 17 in the textual instrument, are used for the analysis. They are 26 females and 9 males, their ages are predominantly 18-29 and their field of study primarily Business and Economics.
Models are annotated with the column name by which they represented in the data file (C.1, C.2 etc).
Models are annotated with the column name by which they represented in the data file (C.1, C.2 etc).
To measure the distance between participant responses we first map FD, PD, N, PS, FS (their actual responses) to the interval scale \([1,5]\). Then for each of the twenty scenarios and for each group (symbolic vs. textual) we perform all pair-wise comparisons between participant responses \(r_i\) and \(r_j\), \(i,j = 1\ldots N, i\neq j\) to calculate the normalized distance \(|r_i - r_j|/4\); the average of all these \(N(N-1)/2\) distances is considered, \(N\) being the number of participants for each group. The resulting set consists of \(2\) (groups) \(\times\) \(20\) (exercises) \(= 40\) data points each expressing level of total distance between the ratings of every pair of participants.
There is an obvious difference between positive and negative contributions, the latter yielding more disagreement. An effect can also be seen in intensity yet more subtle. Interestingly, a greater level of agreement appears to emerge in symbolic representations, especially in positive contributions.
In these two graphs we compare models in which the origin has a positive contribution with those whose origin has a negative one. Responses associated with models in which the satisfaction of the origin is “no information” (N) are omitted.
While a possible effect of the group (symbolic vs. textual) is very subtle, there is a clearer effect can be seen in the figure on the right: denial of origin is always a source of disagreement. The difference is more pronounced in the positive contributions where the disagreement is nevertheless generally less.
To measure the distance between the participant responses and the normative ones according to the formal semantics (or simply the accuracy of the participants’ responses), we again coded both responses to the scale \([1,5]\) which we interpreted as interval. For each single response \(i\) on exercise \(j\) two distances are calculated: relative distance \(d_i = obs_{ij} - norm_j\) and absolute distance \(|d_i|\). When the former is positive participants overestimate satisfaction of the destination and vice-versa.
In the top left figure pp,p,n,nn stand for “\(++\)”,“\(+\)”,“\(-\)”,“\(--\)” or make, help, hurt, break depending on the group considered. In the same figure it is clear that with positive labels responses satisfaction of the destination goal is overestimated and with negative is underestimated.
The top left figure seems to also show that when the origin goal is denied, the underestimation is more pronounced.
The following are the p-values of one sample t-tests (as well as the non-parametric equivalent) that the distance from normative is zero (0) of each of the \(2\times 2\times2\) (visualization group,label sign, label intensity) cells, as well as the \(2\times 2 \times 2\) (visualization group, label sign, origin satisfaction) cell.
Relative distance is meaningful for this test. They are all independent comparisons.
Group | Mean Deviation | Sign. (* means p < 0.5) |
---|---|---|
Symbolic.pp | 0.49 | * (non-zero) |
Symbolic.p | 0.14 | |
Symbolic.m | -0.31 | * (non-zero) |
Symbolic.mm | -0.38 | * (non-zero) |
Textual.pp | 0.61 | * (non-zero) |
Textual.p | 0.29 | * (non-zero) |
Textual.m | -0.20 | |
Textual.mm | -0.49 | * (non-zero) |
Symbolic.ppSat | 0.17 | * (non-zero) |
Symbolic.pSat | 0.39 | * (non-zero) |
Symbolic.mSat | 0.36 | |
Symbolic.mmSat | 0.72 | * (non-zero) |
Symbolic.ppDen | 0.83 | * (non-zero) |
Symbolic.pDen | -0.14 | |
Symbolic.mDen | -1.03 | * (non-zero) |
Symbolic.mmDen | -1.44 | * (non-zero) |
Textual.ppSat | -0.09 | |
Textual.pSat | 0.21 | |
Textual.mSat | 0.62 | |
Textual.mmSat | 0.82 | * (non-zero) |
Textual.ppDen | 1.41 | * (non-zero) |
Textual.pDen | 0.47 | |
Textual.mDen | -1.00 | * (non-zero) |
Textual.mmDen | -1.88 | * (non-zero) |
We now investigate under what circumstances participants overestimate and underestimate satisfaction. In the table below the average overestimation (positive value) or underestimation (negative value) per visualization style, contribution label and satisfaction origin is displayed.
## Label Make (++) Help (+) Hurt (-) Break (--)
## Group Origin
## Symbolic Sat 0.16666667 0.38888889 0.36111111 0.72222222
## Den 0.83333333 -0.13888889 -1.02777778 -1.44444444
## Textual Sat -0.08823529 0.20588235 0.61764706 0.82352941
## Den 1.41176471 0.47058824 -1.00000000 -1.88235294
There are some differences between the two visualizations in terms of overestimation and underestimation. In addition, extreme labels (\(++\)) and (\(--\)) may naturally feature greater error. Cases in which the average error is consistent and substantial (\(>0.7\)) are:
hurt and break labels with denied origin goals, where satisfaction is underestimated.
break labels with satisfied origin goals, where satisfaction is overestimated. As above participants do not seem to perceive the satisfaction inversion of negative labels.
make links with denied origin goals, where satisfaction is overestimated.
Looking at the descriptive images above, it becomes clear that many participants do not seem to perceive the satisfaction inversion of negative labels (cases (a) and (b)). In addition they do not seem to accept that even a strong makes relationship can result in a fully denied destination goal.
If we focus exclusively on cases in which the satisfaction of the origin is labelled as “no-information” (N) the following table describes the average deviation from normative.
## Label Make (++) Help (+) Hurt (-) Break (--)
## Group
## Symbolic 0.4444444 0.2222222 -0.2222222 -0.4444444
## Textual 0.4117647 0.1176471 -0.2352941 -0.3529412
It seems that users perceive contributions as generators of satisfaction than just propagators thereof. Positive contributions result in some satisfaction, and negative contributions result to some denial irrespective of the satisfaction of the origin. This mental model of labels conflicts with the links’ ability to completely inverse satisfaction, which users also had some trouble to comprehend.
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: dRelLab[, -c(1, 2)]
## Chi-Sq (approx.) = 14.706, df = 10, p-value = 0.1431
Box’s test does not meet the significance test; we can assume homogeneity of covariance matrices.
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.79087, p-value = 0.001132
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.59411, p-value = 9.134e-06
We do seem to have issues with normality and the small sample size does not entirely allow us to deal with it by appealing to robustness arguments.
##
## Type III Repeated Measures MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## (Intercept) 1 0.001781 0.0589 1 33 0.809793
## Group 1 0.019667 0.6620 1 33 0.421674
## Sig 1 0.198117 8.1531 1 33 0.007378 **
## Group:Sig 1 0.005268 0.1748 1 33 0.678606
## Intensity 1 0.178278 7.1596 1 33 0.011516 *
## Group:Intensity 1 0.081121 2.9133 1 33 0.097245 .
## Sig:Intensity 1 0.097017 3.5455 1 33 0.068546 .
## Group:Sig:Intensity 1 0.012279 0.4102 1 33 0.526271
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response transformation matrix:
## Sig1
## pp -1
## p -1
## m 1
## mm 1
##
## Sum of squares and products for the hypothesis:
## Sig1
## Sig1 31.46889
##
## Sum of squares and products for error:
## Sig1
## Sig1 127.3711
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.1981169 8.153131 1 33 0.0073777 **
## Wilks 1 0.8018831 8.153131 1 33 0.0073777 **
## Hotelling-Lawley 1 0.2470646 8.153131 1 33 0.0073777 **
## Roy 1 0.2470646 8.153131 1 33 0.0073777 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We have significant effect on Sign \(F(1,33)=8.1531, p=0.007378\)
##
## Response transformation matrix:
## Intensity1
## pp 1
## p -1
## m -1
## mm 1
##
## Sum of squares and products for the hypothesis:
## Intensity1
## Intensity1 1.388889
##
## Sum of squares and products for error:
## Intensity1
## Intensity1 6.401699
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.1782778 7.159557 1 33 0.011516 *
## Wilks 1 0.8217222 7.159557 1 33 0.011516 *
## Hotelling-Lawley 1 0.2169563 7.159557 1 33 0.011516 *
## Roy 1 0.2169563 7.159557 1 33 0.011516 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We have significant effect on Intensity \(F(1,33)=7.1596, p=0.011516\)
In these tests, we utilize a different aggregation of the data, in which the satisfaction of the origin is also considered as one of the factors. In this data set the data points coming from models in which satisfaction of origin is “No Information” (N) are excluded.
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: dRelSatPr[, -c(1, 2)]
## Chi-Sq (approx.) = 64.497, df = 36, p-value = 0.002446
Given the number of DVs here it is not suprizing that Box’s M test fails. Following Tabachnick and Fidell, we randomly remove one of the cases in order to bring all cells to equal size. In this case and given we do not fail for \(p<0.001\) we can proceed with the analysis.
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.58052, p-value = 6.784e-06
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.51064, p-value = 1.597e-06
(see commentary above on normality)
##
## Type III Repeated Measures MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## (Intercept) 1 0.001860 0.0596 1 32 0.808654
## Group 1 0.021013 0.6868 1 32 0.413381
## Sig 1 0.195266 7.7647 1 32 0.008884 **
## Group:Sig 1 0.006464 0.2082 1 32 0.651259
## Origin 1 0.275118 12.1451 1 32 0.001450 **
## Group:Origin 1 0.015996 0.5202 1 32 0.475996
## Sig:Origin 1 0.207551 8.3811 1 32 0.006778 **
## Group:Sig:Origin 1 0.033058 1.0940 1 32 0.303418
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response transformation matrix:
## Sig1
## ppSat -1
## ppDen -1
## pSat -1
## pDen -1
## mSat 1
## mDen 1
## mmSat 1
## mmDen 1
##
## Sum of squares and products for the hypothesis:
## Sig1
## Sig1 132.7206
##
## Sum of squares and products for error:
## Sig1
## Sig1 546.9706
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.1952660 7.764693 1 32 0.008884 **
## Wilks 1 0.8047340 7.764693 1 32 0.008884 **
## Hotelling-Lawley 1 0.2426467 7.764693 1 32 0.008884 **
## Roy 1 0.2426467 7.764693 1 32 0.008884 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is a main effect on Sign \(F(1,32)=7.7647, p=0.008884\).
##
## Response transformation matrix:
## Origin1
## ppSat -1
## ppDen 1
## pSat -1
## pDen 1
## mSat -1
## mDen 1
## mmSat -1
## mmDen 1
##
## Sum of squares and products for the hypothesis:
## Origin1
## Origin1 222.4853
##
## Sum of squares and products for error:
## Origin1
## Origin1 586.2059
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.2751177 12.1451 1 32 0.0014504 **
## Wilks 1 0.7248823 12.1451 1 32 0.0014504 **
## Hotelling-Lawley 1 0.3795344 12.1451 1 32 0.0014504 **
## Roy 1 0.3795344 12.1451 1 32 0.0014504 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is a main effect on Satisfactin of Origin goal \(F(1,32) = 12.1451, p=0.00145\).
The above interaction is fairly intuitive: for negative contributions a denial of the origin leads to underestimation of the destination satisfaction (participants do not perceive satisfaction inversion). Positive contributions appear to be perceived as “blocking” the denial of the origin goal, hence the slight overestimation in such configurations.
We now turn to deviations based on absolute distance.
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: dAbsLab[, -c(1, 2)]
## Chi-Sq (approx.) = 13.059, df = 10, p-value = 0.2204
Box’s test does not meet the significance test; we can assume homogeneity of covariance matrices.
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.88285, p-value = 0.02916
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.84106, p-value = 0.007859
We have deviation from normality which furthermore cannot be cured with transformations. Judging from the “quantized” appearance of the graphs we can suspect that the culprit is the discrete scale [0..4] for measuring distance combined probably with the small sample size. The tests that follow are based on the assumption of their robustness to such deviations suggested in Tabachnick and Fidel and that for “large” \(N\) the problem is less critical.
##
## Type III Repeated Measures MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## (Intercept) 1 0.53737 38.331 1 33 5.502e-07 ***
## Group 1 0.02478 0.839 1 33 0.36646
## Sig 1 0.17961 7.225 1 33 0.01118 *
## Group:Sig 1 0.00335 0.111 1 33 0.74111
## Intensity 1 0.01794 0.603 1 33 0.44298
## Group:Intensity 1 0.00868 0.289 1 33 0.59458
## Sig:Intensity 1 0.05417 1.890 1 33 0.17848
## Group:Sig:Intensity 1 0.02010 0.677 1 33 0.41651
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There are significant main effects in Sign \(F(1,33)=7.225, p<0.05\) and pretty much this is all there is.
##
## Response transformation matrix:
## Sig1
## pp -1
## p -1
## m 1
## mm 1
##
## Sum of squares and products for the hypothesis:
## Sig1
## Sig1 16.05556
##
## Sum of squares and products for error:
## Sig1
## Sig1 73.33503
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.1796113 7.224833 1 33 0.011179 *
## Wilks 1 0.8203887 7.224833 1 33 0.011179 *
## Hotelling-Lawley 1 0.2189343 7.224833 1 33 0.011179 *
## Roy 1 0.2189343 7.224833 1 33 0.011179 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response transformation matrix:
## Intensity1
## pp 1
## p -1
## m -1
## mm 1
##
## Sum of squares and products for the hypothesis:
## Intensity1
## Intensity1 0.6422222
##
## Sum of squares and products for error:
## Intensity1
## Intensity1 35.14837
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0179439 0.6029678 1 33 0.44298
## Wilks 1 0.9820561 0.6029678 1 33 0.44298
## Hotelling-Lawley 1 0.0182718 0.6029678 1 33 0.44298
## Roy 1 0.0182718 0.6029678 1 33 0.44298
There is no Intensity effect.
To consider satisfaction of origin as one of the factors, a different data set is prepared, in which the N (no information) satisfaction value is eliminated.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: dAbsSat[, -c(1, 2)]
## Chi-Sq (approx.) = 59.233, df = 36, p-value = 0.008688
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: dAbsSatPr[, -c(1, 2)]
## Chi-Sq (approx.) = 55.805, df = 36, p-value = 0.01868
Given the number of DVs here it is not suprizing that Box’s M test fails. Following Tabachnick and Fidell, we randomly remove one of the cases in order to bring all cells to equal size. In this case and given we do not fail for \(p<0.001\) we can proceed with the analysis.
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.58437, p-value = 7.377e-06
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.53423, p-value = 2.564e-06
See above for discussion on normality.
##
## Type III Repeated Measures MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## (Intercept) 1 0.55736 40.293 1 32 3.982e-07 ***
## Group 1 0.01968 0.642 1 32 0.428746
## Sig 1 0.22793 9.447 1 32 0.004301 **
## Group:Sig 1 0.01126 0.364 1 32 0.550297
## Origin 1 0.17249 6.670 1 32 0.014582 *
## Group:Origin 1 0.00088 0.028 1 32 0.867480
## Sig:Origin 1 0.00408 0.131 1 32 0.719534
## Group:Sig:Origin 1 0.00051 0.016 1 32 0.898887
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is a significant effect with Sign \(F(1,32) = 9.447, p<0.01\) and the satisfaction value of the origin \(F(1,32) = 6.67, p<0.05\). As we saw in the graphs, both negative sings and denial values cause more deviation from the formal semantics.
##
## Response transformation matrix:
## Sig1
## ppSat -1
## ppDen -1
## pSat -1
## pDen -1
## mSat 1
## mDen 1
## mmSat 1
## mmDen 1
##
## Sum of squares and products for the hypothesis:
## Sig1
## Sig1 119.1176
##
## Sum of squares and products for error:
## Sig1
## Sig1 403.5
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.227925 9.446753 1 32 0.0043013 **
## Wilks 1 0.772075 9.446753 1 32 0.0043013 **
## Hotelling-Lawley 1 0.295211 9.446753 1 32 0.0043013 **
## Roy 1 0.295211 9.446753 1 32 0.0043013 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response transformation matrix:
## Origin1
## ppSat -1
## ppDen 1
## pSat -1
## pDen 1
## mSat -1
## mDen 1
## mmSat -1
## mmDen 1
##
## Sum of squares and products for the hypothesis:
## Origin1
## Origin1 84.94118
##
## Sum of squares and products for error:
## Origin1
## Origin1 407.5
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.1724900 6.670227 1 32 0.014582 *
## Wilks 1 0.8275100 6.670227 1 32 0.014582 *
## Hotelling-Lawley 1 0.2084446 6.670227 1 32 0.014582 *
## Roy 1 0.2084446 6.670227 1 32 0.014582 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To analyze response confidence and what affects it we first map the Likert-type scale to the scale {-2,-1,0,+1,+2}, and then sum up the correspondingly coded values per type of contribution link for each of the groups. We then proceed with the analysis as above. Neither visually nor through statistical tests do we find any main effect with respect to type of label to the response confidence of the participants.
##
## Type III Repeated Measures MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## (Intercept) 1 0.52017 35.774 1 33 1.02e-06 ***
## Group 1 0.00517 0.172 1 33 0.68142
## Sig 1 0.06982 2.477 1 33 0.12505
## Group:Sig 1 0.00898 0.299 1 33 0.58823
## Intensity 1 0.03388 1.157 1 33 0.28984
## Group:Intensity 1 0.04030 1.386 1 33 0.24754
## Sig:Intensity 1 0.09285 3.378 1 33 0.07510 .
## Group:Sig:Intensity 1 0.10918 4.045 1 33 0.05254 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response transformation matrix:
## Sig1
## pp -1
## p -1
## m 1
## mm 1
##
## Sum of squares and products for the hypothesis:
## Sig1
## Sig1 0.8022222
##
## Sum of squares and products for error:
## Sig1
## Sig1 10.68719
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0698227 2.477109 1 33 0.12505
## Wilks 1 0.9301773 2.477109 1 33 0.12505
## Hotelling-Lawley 1 0.0750639 2.477109 1 33 0.12505
## Roy 1 0.0750639 2.477109 1 33 0.12505
##
## Response transformation matrix:
## Intensity1
## pp 1
## p -1
## m -1
## mm 1
##
## Sum of squares and products for the hypothesis:
## Intensity1
## Intensity1 0.3755556
##
## Sum of squares and products for error:
## Intensity1
## Intensity1 10.70915
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0338805 1.157266 1 33 0.28984
## Wilks 1 0.9661195 1.157266 1 33 0.28984
## Hotelling-Lawley 1 0.0350687 1.157266 1 33 0.28984
## Roy 1 0.0350687 1.157266 1 33 0.28984