Evaluate & Compare
To objectively demonstrate the value of artworks and research, it is not enough to say "it seems good" --- you need to show how experiencers felt using numbers and data. This section introduces evaluation methods from three perspectives: what to measure, how to measure it, and how to compare and dig deeper.
What to Measure
First, let's clarify "What are we actually measuring?"
1. UX (User Experience)
UX refers to everything a user feels when using or experiencing a product, service, or artwork. "It was easy to use," "It was interesting," "I want to experience it again" --- all of these are part of UX.
UX evaluation of interactive products is mainly divided into the following two qualities:
- Pragmatic quality --- Qualities related to achieving a goal: "Is it easy to use?" "Is it understandable?" "Is it efficient?" For a smartphone app, this means "Can I quickly reach the desired function?"
- Hedonic quality --- Qualities related to emotional and affective satisfaction: "Is it interesting?" "Is it novel?" "Is it attractive?" For a smartphone app, this means "Is it exciting to use?"
Why divide into two?
There are products that are "easy to use but boring" and products that are "hard to use but interesting." By evaluating both qualities separately, you can clearly identify what the strengths and areas for improvement of a work are.
2. Impression
When evaluating the impression received from a subject, we assess it through multiple adjectives and onomatopoeia. For example, the impression of a work is quantified on scales of opposing word pairs such as "bright <-> dark" and "soft <-> hard." A single adjective cannot capture the full impression, so by combining multiple adjective pairs, we can objectively grasp multifaceted impressions.
3. Preference
The intuitive feeling that something is good, fitting, or appropriate. Unlike impressions, preference includes a "personal value judgment." Even for the same work, preferences vary greatly between individuals.
4. Emotion
The P(V)AD model is well known for measuring emotions:
| Dimension | Meaning | Example |
|---|---|---|
| Pleasure / Valence | Positive <-> Negative | Happy <-> Sad |
| Arousal | Excited <-> Calm | Thrilling <-> Relaxed |
| Dominance | Dominant <-> Submissive | In control <-> Overwhelmed |
Emotions received from content are often evaluated on two axes: valence and arousal. For example, horror movies are positioned as "unpleasant x high arousal," and healing music as "pleasant x low arousal."
How to Measure
1. SD (Semantic Differential) Method
A method for objectively quantifying and analyzing the impressions people receive from artworks, products, etc.
How it works: Prepare multiple pairs of opposing adjectives (adjective pairs) such as "bright - dark" and "artificial - natural," and have respondents rate each on a 5- or 7-point scale indicating which adjective they lean toward (reference).
Interpreting results: Connecting the mean values of each adjective pair in a line graph reveals the "impression profile" of the subject. By overlaying multiple works or conditions on the same graph, differences in impressions become immediately apparent.
Example
When comparing impressions of "a particular acoustic artwork" and "conventional speaker playback":
- Work A: Leans toward "fantastic," "dynamic," "comfortable"
- Work B: Leans toward "realistic," "static," "unsettling"
-> This analysis suggests that Work A provides an immersive, non-everyday experience
References: Impression evaluation of AR installations, Impression evaluation of virtual forest bathing content, Graphing SD method survey results
2. Semi-Structured Interview
A research method where questions are prepared in advance, but additional questions are flexibly added or modified based on the respondent's answers to dig deeper (reference).
Difference from surveys: Surveys limit responses to "yes/no" or "5-point scales," making unexpected discoveries difficult. In interviews, you can dig deeper with questions like "Why did you feel that way?" or "At which specific moment?" allowing you to grasp the quality of experiences that numbers alone cannot reveal.
What "semi-structured" means: Completely free conversation (unstructured) tends to go off-topic, while strictly fixed questions (structured) prevent digging deeper. Semi-structured interviews are the middle ground.
3. Focus Group Interview (FGI)
A research method where one moderator conducts a roundtable-style interview with a group of approximately 4-8 people (reference).
Advantages: Participants stimulate each other's responses, generating more diverse opinions than one-on-one interviews. Discussions come alive with reactions like "Oh, I thought so too!" or "Actually, it was the opposite for me..."
Caution: There is a risk that a dominant participant may pull everyone toward the same opinion.
4. Specific Measurement Methods (Scales & Questionnaires)
In the research world, many measurement tools (scales) with established reliability and validity already exist. Choose the appropriate one based on your purpose.
| Scale | What It Measures | Features |
|---|---|---|
| AttrakDiff | Pragmatic and hedonic quality of products | A classic UX scale evaluating both usability and attractiveness |
| UEQ / UEQ-S (survey sample) | Overall UX | The 8-item short version (UEQ-S) is quick and easy to administer |
| SAM (survey sample) | Emotion (valence, arousal) | Responses are given using illustrations (manikins) rather than words, making it language-independent |
| POMS2 | Negative mood states | Measures multiple dimensions of negative mood including tension, depression, anger, fatigue, and confusion |
| Temporal Dominance of Emotion | Temporal changes in emotion | Captures how emotions shift over time during an experience |
| PRS | Perceived restorativeness of places | Evaluates how much psychological rest a space provides across 4 factors |
| PANAS | Positive and negative affect | A scale that evaluates positive and negative affect as two independent axes |
| ME (Magnitude Estimation) method | Sensory intensity | Respondents indicate "how many times stronger" a sensation is compared to a reference stimulus, directly quantifying sensory intensity |
Methods for Comparing & Digging Deeper
To more deeply understand evaluation results, we use the following comparison and analysis methods.
Comparison Methods
- Vary elements within the artwork --- For example, prepare a version with sound and a version without sound to verify the effect of sound (prepare a dummy artwork for comparison)
- Vary the medium --- Investigate how differences in presentation methods affect impressions: monitor vs. projector, headphones vs. speakers, etc.
- Compare with existing works --- By evaluating your work alongside similar existing works or products, you can highlight the distinctive features of your work
Statistical Analysis Methods
- ANOVA (Analysis of Variance) --- A method for testing whether there are statistically significant differences in the means of 3 or more groups. For example, you can test whether there are differences in impression ratings among Works A, B, and C. For comparing 2 groups, use a t-test
- Factor Analysis --- A method for organizing the many adjective-pair ratings obtained from the SD method and identifying a small number of underlying factors (categories). For example, if "bright," "vivid," and "flashy" cluster together, you might name it the "activity" factor
Data Exploration Methods
- Conjoint Analysis --- A method for identifying which elements (color, material, shape, etc.) among the multiple components of a product or service, and to what degree, influence preference. It yields insights like "color has the greatest influence on preference"
- Text Mining --- A method for statistically analyzing frequently occurring words and word co-occurrences from survey free-text responses or interview transcripts. Effective for identifying trends from large volumes of text data