In all the previous examples, I use LexOPS to match item-wise. In this example, I show code for item-wise matching without using LexOPS. The idea is to show how item-wise matching with tolerances actually works.

As a simple example, we’ll recreate the design from 01 - Item-Wise Matching using hand-written code.

Setup

Load Packages

library(tidyverse)

Import the Data

stim_pool <- read_csv("stim_pool.csv")

Item-Wise Matching from Scratch

Let’s create a list of 100 young and 100 old faces, matched on face width.

Step 1: Identify conditions

First, we define our conditions. Let’s say young faces are <28 years old, and old faces are >50 years old. We can use dplyr’s mutate() and case_when() to create our conditions.

stim_pool <- mutate(stim_pool, cond = case_when(age<28 ~ "young", age>50 ~ "old"))

This gives us >1000 candidate items for each of our conditions.

count(stim_pool, cond)

## # A tibble: 3 x 2
##   cond      n
##   <chr> <int>
## 1 old    1116
## 2 young  1202
## 3 <NA>   2682

Step 2: Decide on Tolerance

The first step is to decide how closely we want to match face width. Let’s imagine that we decide to match face width within ±0.5 pixels. For reference later, we’ll store this in a vector:

tol <- c(-0.5, 0.5)

Step 3: Find Matches

Here is the code for finding 100 pairs of young and old faces matched for face width.

On each iteration, we:

Get a list of all candidate faces which haven’t already been selected.
Pick a random item from the young condition.
Try pick a random item from the old condition which is suitably close (within ±0.5 face width).
If there was no suitable match, skip to the next iteration. Otherwise, record the match.

We keep going until we have n_stim (100) matched pairs. To do this, we can use a while loop. Normally you would also want to add break statements to handle cases when the method fails, but I’ve left these out for now to keep the code simple.

This will give us a list of matched pairs in the form list(c("stim_42", "stim_54"), c("stim_972", "stim_43")), of length 100.

# number of stimuli we want
n_stim <- 100

# the conditions we're interested in
age_conds <- c("young", "old")

# keep track of how many matches have been found
n_found <- 0
matches <- list()

while (n_found < n_stim) {
  # 1. filter the pool to exclude items already matched
  iter_pool <- filter(stim_pool, !stim_id %in% unlist(matches))
  
  # 2. pick a cond_i item at random, and get it's data
  young_item <- iter_pool %>%
    filter(cond == "young") %>%
    slice_sample(n = 1)
  
  # 3. filter the pool to find an item suitably close to the match null
  old_item <- iter_pool %>%
    filter(
      cond == "old",
      face_width >= young_item$face_width + tol[1],
      face_width <= young_item$face_width + tol[2]
    ) %>%
    slice_sample(n = 1)
  
  # 4. if no suitable match, end the iteration...
  if (nrow(old_item) == 0) next
  
  # ...otherwise, record the match
  matches[[length(matches) + 1]] <- c(young = young_item$stim_id, old = old_item$stim_id)
  n_found <- n_found + 1
}

Step 4: Check Stimuli

Now we have a list of all the matched pairs, we can put this in a dataframe like so .

match_df <- map_df(1:length(matches), function(i) {
  tibble(
    pair_nr = i,
    young = matches[[i]]["young"],
    old = matches[[i]]["old"]
  )
})

match_df

## # A tibble: 100 x 3
##    pair_nr young     old      
##      <int> <chr>     <chr>    
##  1       1 face_2355 face_1997
##  2       2 face_4952 face_2646
##  3       3 face_169  face_4147
##  4       4 face_1637 face_911 
##  5       5 face_1656 face_3414
##  6       6 face_2181 face_3664
##  7       7 face_4674 face_295 
##  8       8 face_553  face_1295
##  9       9 face_1548 face_601 
## 10      10 face_1982 face_2997
## # ... with 90 more rows

We can add to this the face width values associated with each young and old item, and calculate the difference.

stim_pool_young <- stim_pool %>%
  rename(face_width_young = face_width) %>%
  select(stim_id, face_width_young)

stim_pool_old <- stim_pool %>%
  rename(face_width_old = face_width) %>%
  select(stim_id, face_width_old)

match_df <- match_df %>%
  left_join(stim_pool_young, by=c("young" = "stim_id"))  %>%
  left_join(stim_pool_old, by=c("old" = "stim_id")) %>%
  mutate(face_width_diff = face_width_young - face_width_old)

match_df

## # A tibble: 100 x 6
##    pair_nr young     old       face_width_young face_width_old face_width_diff
##      <int> <chr>     <chr>                <dbl>          <dbl>           <dbl>
##  1       1 face_2355 face_1997             62.6           62.7         -0.160 
##  2       2 face_4952 face_2646             66.1           66.1          0.0600
##  3       3 face_169  face_4147             59.8           60.0         -0.180 
##  4       4 face_1637 face_911              63.2           63.6         -0.400 
##  5       5 face_1656 face_3414             68.2           68.4         -0.190 
##  6       6 face_2181 face_3664             61.6           61.8         -0.200 
##  7       7 face_4674 face_295              67.4           67            0.450 
##  8       8 face_553  face_1295             60.0           59.8          0.200 
##  9       9 face_1548 face_601              66.5           66.9         -0.410 
## 10      10 face_1982 face_2997             61.0           60.8          0.150 
## # ... with 90 more rows

Here is the distribution of differences between the young and old faces. Notice that they are between -0.5 and 0.5.

match_df %>%
  ggplot(aes(face_width_diff)) +
  geom_histogram(binwidth = 0.1) +
  scale_x_continuous(n.breaks=10)

We can also create a plot similar to plot_design() to show the distributional similarity at the same time:

match_df %>%
  select(young, old, pair_nr) %>%
  pivot_longer(c(young, old), names_to="cond", values_to="stim_id") %>%
  left_join(select(stim_pool, stim_id, face_width), by="stim_id") %>%
  ggplot(aes(cond, face_width)) +
  geom_violin() +
  geom_point() +
  geom_line(aes(group = pair_nr))

Summary

I decided to write LexOPS because it lets you match items with readable and easily editable code, but as you can see the method is actually very simple! You could adapt the code written here to create any stimuli you could create with LexOPS, including the other examples provided.

08 - Item-Wise Matching from Scratch