Randomization

Introduction

This tutorial will show you how to use the randomizer and blockTools packages for R to randomly assign units like people and villages to study arms.

Generate fake data

To get started, we need some data. Imagine that you are working with a local NGO to prospectively evaluate one of their programs. At your request, they visit 26 villages (clusters) and enroll 1000 adults between the ages of 18 and 35 to participate in their program. Here’s the enrollment data they send you.

# load
  library(learnr)
  library(tidyverse)
## Registered S3 method overwritten by 'rvest':
##   method            from
##   read_xml.response xml2
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.0          ✔ purrr   0.3.2     
## ✔ tibble  2.1.3          ✔ dplyr   0.8.3     
## ✔ tidyr   0.8.3.9000     ✔ stringr 1.4.0     
## ✔ readr   1.1.1          ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
  library(randomizr)
  library(randomNames)
  library(blockTools)
  tutorial_options(exercise.timelimit = 60)
  
# setup
  set.seed(371702) # makes this reproducible
  N <- 1000        # sample size

# create data
  enrolled <-
  data.frame(partID = seq(from=1, to=N, by=1)) %>%          # fake participants
  mutate(clusterID = sample(letters[1:26], N, replace=TRUE),# village IDs
         clusterPop = sample(200:20000, N, replace=TRUE),   # village population
         female = sample(0:1, N, replace=TRUE),             # female=1, male=0
         married = sample(0:1, N, replace=TRUE),            # married-1, not=0
         age = sample(18:35, N, replace=TRUE),              # participant age
         name = randomNames(N, gender=female)               # names
         ) %>%
  select(partID, name, female, married, age, clusterID, clusterPop)
  enrolled

Randomize individuals

We we can use the complete_ra() function from the randomizer package to randomly assign individuals to study arms. In the example below we specify 2 arms labeled “treatment” and “control”. Run the code and observe the counts of participants assigned to each arm.

# randomize
  rand_ind <- 
  enrolled %>%
    select(partID, name) %>%
    mutate(arm = complete_ra(N = N,                         # N is sample size
                             num_arms = 2,
                             conditions=c("control", "treatment"))
           )
  rand_ind

# check
  rand_ind %>%
    group_by(arm) %>%
    count()

Try setting num_arms to 3, add a third condition label, and re-run the code.

Block randomization

Single blocking variable

If we want to make sure that participant sex is equally distributed across arms (more of a concern when the sample size is small), or if we want to setup a heterogeneity analysis where we look at the differential impact of the intervention by sex, we can block the randomization on sex using the block_ra() function. Run the code.

# randomize
  rand_ind_bl <- 
  enrolled %>%
    select(partID, name, female) %>%
    mutate(arm = block_ra(blocks = female,
                          num_arms = 2,
                          conditions=c("control", "treatment"))
           )
  rand_ind_bl

# check
  rand_ind_bl %>%
    group_by(arm, female) %>%
    count()

Multiple blocking variables

If we have multiple binary or categorical variables that we want to block on, we could create a variable that represents the combination of these variables and pass that to the block_ra() function. For instance, here we create a variable called femaleMarried that indicates the sex and marital status of each person. Run the code.

# randomize
  rand_ind_bl_m <- 
  enrolled %>%
    select(partID, name, female, married) %>%
    mutate(femaleMarried = ifelse(female==1 & married==1, "married, female",
                           ifelse(female==1 & married==0, "not married, female",
                           ifelse(female==0 & married==1, "married, male",
                           ifelse(female==0 & married==0, "not married, male",
                                  "??"))))
           ) %>%
    mutate(arm = block_ra(blocks = femaleMarried,
                          num_arms = 2,
                          conditions=c("control", "treatment"))
           )
  rand_ind_bl_m

# check
  rand_ind_bl_m %>%
    group_by(arm, femaleMarried) %>%
    count()

Note how our categories appear to be balanced across arms.

Multiple blocking variables, blockTools

That’s simple, but what if we wanted to block on multiple variables, including some continuous like age? One solution is to use the blockTools package to construct matched blocks and pass this blocking information to the block_ra() function. Run the code.

# create blocks using blockTools
  block.out <- block(data = enrolled, n.tr = 2, id.vars ="partID", 
                     algorithm="randGreedy",
                     block.vars = c("female", "age", "married"), 
                     verbose=FALSE) 
  
# conduct randomization
  rand_ind_bl_bt <-
  enrolled %>%
    select(partID, name, female, age, married) %>%
    mutate(block = createBlockIDs(block.out, ., id.var = "partID")) %>%
    mutate(arm = block_ra(blocks = block,
                          num_arms = 2,
                          conditions=c("control", "treatment"))
           )
  rand_ind_bl_bt

# check
  rand_ind_bl_bt %>%
    group_by(arm) %>%
    summarize(ageMean = mean(age), 
              femaleP = mean(female),
              marriedP = mean(married))

We appear to get pretty balanced arms on sex, age, and marital status.

Randomize clusters

No blocking

Let’s say we wanted to randomize villages (clusters), not individuals. We can use the cluster_ra() function instead and pass the clusterID. Run the code.

# randomize
  rand_cluster <- 
  enrolled %>%
    select(partID, name, clusterID) %>%
    mutate(arm = cluster_ra(clusters = clusterID,       # cluster is cluster ID
                             num_arms = 2,
                             conditions=c("control", "treatment"))
           )
  rand_cluster

# check
  rand_cluster %>%
    distinct(clusterID, .keep_all = TRUE) %>%
    group_by(arm) %>%
    count()

The default is to assign clusters 1:1 to the arms.

With blocking

We can randomize by clusters and include blocking information using the block_and_cluster_ra function. Try running the code.

# identify cluster blocking
  cluster_block <- data.frame(clusterID = letters[1:26],
                               clusterHigh = sample(0:1, 26, replace=TRUE,
                                                    prob = c(10/26, 16/26)))
# randomize
  rand_cluster_bl <- 
  cluster_block %>%
    mutate(clusterHigh = sample(0:1, 26, replace=TRUE),
           arm = block_and_cluster_ra(clusters = clusterID, # cluster is cluster ID
                                      blocks = clusterHigh, # clusterHigh is block
                            num_arms = 2,
                            conditions=c("control", "treatment"))
           )
  rand_cluster_bl
  
# check
  rand_cluster_bl %>%
    group_by(arm, clusterHigh) %>%
    count()

Here we see clusterHigh is balanced across arms.

To learn more

Check out the full suite of tools at Declare Design. If you need to prepare a random assignment materials in advance for on-the-fly assignment, have a look at the Randomizer web app.

Tutorial