Packages

library(readr)
library(dplyr)
library(ggplot2)
library(LexOPS)

Importing Datsets

Importing Sensorimotor Norms

The Lancaster Sensorimotor Norms are available from the OSF page.

sensorimotor <- read_csv("https://osf.io/48wsc/download")

## Rows: 39707 Columns: 45
## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): Word, Dominant.perceptual, Dominant.action, Dominant.sensorimotor,...
## dbl (39): Auditory.mean, Gustatory.mean, Haptic.mean, Interoceptive.mean, Ol...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s have a quick peak at the data.

sensorimotor |>
  head(5)

Word	Auditory.mean	Gustatory.mean	Haptic.mean	Interoceptive.mean	Olfactory.mean	Visual.mean	Foot_leg.mean	Hand_arm.mean	Head.mean	Mouth.mean	Torso.mean	Auditory.SD	Gustatory.SD	Haptic.SD	Interoceptive.SD	Olfactory.SD	Visual.SD	Foot_leg.SD	Hand_arm.SD	Head.SD	Mouth.SD	Torso.SD	Max_strength.perceptual	Minkowski3.perceptual	Exclusivity.perceptual	Dominant.perceptual	Max_strength.action	Minkowski3.action	Exclusivity.action	Dominant.action	Max_strength.sensorimotor	Minkowski3.sensorimotor	Exclusivity.sensorimotor	Dominant.sensorimotor	N_known.perceptual	List_N.perceptual	Percent_known.perceptual	N_known.action	List_N.action	Percent_known.action	Mean_age.perceptual	Mean_age.action	List#.perceptual	List#.action
A	2.214286	0.0000000	0.4285714	0.0000000	0.0000000	2.428571	0.0000000	0.3571429	1.071429	0.3571429	0.0000000	2.259291	0.0000000	0.9376145	0.0000000	0.0000000	2.408775	0.0000000	1.336306	2.129077	1.3363062	0.0000000	2.428571	2.934085	0.4788732	Visual	1.071429	1.097256	0.6000000	Head	2.428571	2.984370	0.3541667	Visual	14	19	0.7368421	14	21	0.6666667	36.85714	35.57143	PN_Sample_250.csv	MN_sample_250.csv
A CAPPELLA	4.333333	0.0000000	0.2222222	0.7222222	0.0000000	1.666667	0.3809524	0.4285714	2.714286	3.7142857	1.0000000	1.608799	0.0000000	0.5483189	1.3636265	0.0000000	1.909727	0.9734573	1.075706	1.901128	1.8477786	1.4491377	4.333333	4.420628	0.6240000	Auditory	3.714286	4.167341	0.4046243	Mouth	4.333333	5.414783	0.2854156	Auditory	18	19	0.9473684	21	21	1.0000000	35.72222	35.14286	PN_Sample_237.csv	MN_sample_237.csv
AARDVARK	1.625000	0.5625000	1.6250000	0.0625000	1.2500000	4.125000	0.1764706	0.7058824	2.235294	0.0588235	0.0588235	1.784190	1.2632630	1.9278658	0.2500000	1.8797163	1.258306	0.5285941	1.311712	1.921244	0.2425356	0.2425356	4.125000	4.325018	0.4391892	Visual	2.235294	2.258902	0.6727273	Head	4.125000	4.521367	0.3256773	Visual	16	18	0.8888889	17	20	0.8500000	36.06250	40.82353	PN_Sample_505.csv	MN_sample_505.csv
ABACK	1.294118	0.0588235	0.2941176	1.3529412	0.0000000	2.823529	0.0000000	0.0000000	3.272727	0.3636364	0.1818182	1.896204	0.2425356	0.9851844	1.8007351	0.0000000	2.007340	0.0000000	0.000000	1.902152	0.9244163	0.6030227	2.823529	3.006634	0.4848485	Visual	3.272727	3.274410	0.8571429	Head	3.272727	3.963989	0.3394343	Head	17	20	0.8500000	11	19	0.5789474	43.82353	42.54545	PN_Sample_365.csv	MN_sample_365.csv
ABACUS	1.555556	0.1666667	3.7222222	0.2777778	0.1111111	3.944444	0.0000000	2.4736842	2.631579	0.1052632	0.0000000	1.616904	0.5144958	1.4061025	0.6691132	0.4714045	1.304843	0.0000000	2.269812	2.191157	0.3153018	0.0000000	3.944444	4.887248	0.3920455	Visual	2.631579	3.219225	0.5050505	Head	3.944444	5.314414	0.2631682	Visual	18	19	0.9473684	19	21	0.9047619	36.77778	34.63158	PN_Sample_606.csv	MN_sample_606.csv

Importing Humour Norms

The Humour Norms are available from the Github Page.

humour <- read_csv("https://raw.githubusercontent.com/tomasengelthaler/HumorNorms/master/humor_dataset.csv")

## Rows: 4997 Columns: 16
## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): word
## dbl (15): mean, sd, n, mean_M, sd_M, n_M, mean_F, sd_F, n_F, mean_young, sd_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s have a look at this data too.

humour |>
  head(5)

word	mean	sd	n	mean_M	sd_M	n_M	mean_F	sd_F	n_F	mean_young	sd_young	n_young	mean_old	sd_old	n_old
abbey	2.292683	1.1455109	41	2.176471	1.3800043	17	2.347826	0.9820524	23	2.391304	1.1961731	23	2.166667	1.098127	18
abode	2.413793	1.1185846	29	2.100000	0.9944289	10	2.578947	1.1697953	19	2.692308	1.1821319	13	2.187500	1.046821	16
abscess	1.593750	1.0429293	32	1.625000	1.1877349	8	1.583333	1.0179548	24	1.555556	1.0416176	18	1.642857	1.081818	14
absence	1.640000	0.9521905	25	1.615385	0.9607689	13	1.666667	0.9847319	12	1.571429	0.8516306	14	1.727273	1.103713	11
abstract	2.411765	1.2819882	34	1.933333	1.0327956	15	2.789474	1.3572418	19	2.421053	1.1212983	19	2.400000	1.502379	15

Joining Data Together

Firstly, we’ll rename the Word column to have a lowercase “w”, so it’s consistent with the sensorimotor norms. Then, since all the Lancaster norms’ words are in uppercase (whereas the Humour norms are in lowercase), we’ll then convert the Lancaster norms words to lowercase.

sensorimotor <- sensorimotor |>
  rename(word = Word) |>
  mutate(word = tolower(word))

Next, we will prefix all the features from the humour norms with “Humour.”, so they will be easily identifiable in the final dataset. We can use rename_at() and vars(-word) to add this prefix to all columns except the word column.

humour <- humour |>
  rename_at(vars(-word), ~paste("Humour", .x, sep="."))

Joining the data together is then easy with the dplyr join functions. Here we use full_join(), joining by the common column "word". Finally, we join the data to the lexops in-built dataset, as this contains features we can use to control for length and frequency. Since the words are stored in lexops in the string column, we tell left_join() that these columns should be treated as the same thing, with c("word"="string").

sens_hum <- full_join(sensorimotor, humour, by="word") |>
  left_join(lexops, by=c("word"="string"))

Generating Stimuli

Before we choose boundaries for our splits, we want to check the distributions of our independent variables.

sens_hum |> ggplot(aes(Visual.mean)) + geom_density()
sens_hum |> ggplot(aes(Humour.mean)) + geom_density()
sens_hum |> ggplot(aes(Visual.mean, Humour.mean)) + geom_point(alpha=0.5)

Finally, we can generate stimuli with our new words. We will create two levels of Visual ratings: 0:2 (low) and 3.5:5 (high), and two levels of Humour ratings: 2:2.5 (neutral, as consistently low humour ratings are often tabboo) and 3:5 (high). We’ll control for word length exactly, and word frequency within a tolerance of -0.2:0.2.

Since we’re using our own data, we need to use the set_options() function to tell LexOPS which column contains our unique identifier, i.e., our words (id_col = "word").

stim <- sens_hum |>
  set_options(id_col = "word") |>
  split_by(Visual.mean, 0:2 ~ 3.5:5) |>
  split_by(Humour.mean, 2:2.5 ~ 3:5) |>
  control_for(Length, 0:0) |>
  control_for(Zipf.SUBTLEX_UK, -0.2:0.2) |>
  generate(25)

## Generated 1/25 (4%). 1 total iterations, 1.00 success rate.
Generated 2/25 (8%). 2 total iterations, 1.00 success rate.
Generated 4/25 (16%). 16 total iterations, 0.25 success rate.
Generated 5/25 (20%). 18 total iterations, 0.28 success rate.
Generated 6/25 (24%). 22 total iterations, 0.27 success rate.
Generated 8/25 (32%). 25 total iterations, 0.32 success rate.
Generated 9/25 (36%). 26 total iterations, 0.35 success rate.
Generated 10/25 (40%). 27 total iterations, 0.37 success rate.
Generated 11/25 (44%). 28 total iterations, 0.39 success rate.
Generated 12/25 (48%). 30 total iterations, 0.40 success rate.
Generated 14/25 (56%). 36 total iterations, 0.39 success rate.
Generated 15/25 (60%). 37 total iterations, 0.41 success rate.
Generated 16/25 (64%). 38 total iterations, 0.42 success rate.
Generated 18/25 (72%). 40 total iterations, 0.45 success rate.
Generated 19/25 (76%). 41 total iterations, 0.46 success rate.
Generated 20/25 (80%). 44 total iterations, 0.45 success rate.
Generated 21/25 (84%). 45 total iterations, 0.47 success rate.
Generated 22/25 (88%). 46 total iterations, 0.48 success rate.
Generated 24/25 (96%). 50 total iterations, 0.48 success rate.
Generated 25/25 (100%). 51 total iterations, 0.49 success rate.

We can view a quick summary of our stimuli with the plot_design() function.

plot_design(stim)

Here is the list of stimuli generated for the design of visual sensorimotor ratings (A: A1 low, A2 high) by humour ratings (B: B1 low, B2 high), controlling for word length and frequency.

print(stim)

item_nr	A1_B1	A1_B2	A2_B1	A2_B2	match_null
1	sternum	stinker	forceps	scrotum	A2_B2
2	intrigue	prostate	skylight	sheepdog	A2_B2
3	brunt	whiff	havoc	chimp	A2_B1
4	query	yodel	specs	husky	A2_B1
5	penance	bullion	imprint	charade	A2_B2
6	whimper	ragtime	fielder	panties	A1_B1
7	angst	oomph	lilac	bulge	A2_B1
8	buffer	tingle	skater	pounce	A1_B1
9	creed	clunk	leech	putty	A1_B1
10	credence	gumption	smuggler	coupling	A2_B2
11	colic	bebop	miser	dingo	A1_B1
12	calm	joke	moon	shit	A1_B1
13	omen	boon	info	mutt	A2_B1
14	gust	funk	scab	ogre	A2_B2
15	germ	jinx	bead	boob	A1_B2
16	creak	nymph	pecan	smirk	A1_B1
17	gossip	jingle	rowing	donkey	A1_B1
18	anthem	squawk	anchor	cookie	A2_B1
19	rap	wit	dam	bra	A1_B2
20	mere	tang	lace	zoom	A2_B1
21	aura	whim	font	tutu	A1_B2
22	sinus	chirp	lapel	gourd	A1_B2
23	tempo	fluke	miner	pixie	A2_B2
24	gripe	hooky	rotor	girth	A1_B2
25	fore	burp	halo	hoof	A1_B2

Citing Sources

The cite_design() function is useful for suggesting papers that you should cite having generated your stimuli. Note that for variables LexOPS does not know, while the variable will be suggested as something that needs citing, you will have to find the citation yourself.

cite_design(stim)

## Please also cite LexOPS: Taylor, Beith and Sereno (2020), http://doi.org/10.3758/s13428-020-01389-1

var	measure	source	url
Visual.mean	Custom Measure	Custom Source	Unknown
Humour.mean	Custom Measure	Custom Source	Unknown
Length	Length (Number of Characters)	NA	NA
Zipf.SUBTLEX_UK	Frequency in Zipf (Zipf=log10(frequency per million)+3)	SUBTLEX-UK (van Heuven, Mandera, Keuleers, & Brysbaert, 2014)	https://doi.org/10.1080/17470218.2013.850521

Using Data from Custom Sources