1 AMP summary

1.1 Lesson objectives:

Through AMP lessons 1-4 we have covered some of the key tenets of statistics in a maths-lite way.

1.2 Glossary and key terms:

Continous, categorical, ordinal variables - types of variables common in data analysis (see link)
P-value: The probability that the observed result could be generated purely by chance (i.e. the probability there is no real difference between populations)
Power: The probability that we are able to detect a difference between two populations, given a certain sample size, distribution and error rate. Note that power also has an effect on how robust our results are.
Effect Size: The standardized difference between 2 groups. Often it has the units stdev. So an effect size of 1 is equal to a difference of 1 stdev.
Hypothesis test: a statistical test. Each hypothesis test has two parts, a null hypothesis which says that the two populations are the same (i.e. there is 0 difference) and an alternative hypothesis which says that there is a difference between populations Often the null hypothesis is called H0 and the alternative hypothesis is called H1.
Odds-ratio (OR): The OR represents the odds that an outcome will occur given a particular variable, compared to the odds of the outcome occurring in the absence of that variable
Logistic regression: Logistic regression is used to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

2 Some new concepts

2.1 Multiple comparison problem

The multiple comparison problem (MCP) is a key idea in statistics that relates to hypothesis testing (both direct hypothesis testing and the hypothesis testing that occurs in regressions). The issue, essentially, is that when we test many hypotheses, we increase our chances of generating a false positive:

plist = c() # empty list 
#simulate many many draws of samples from a population
for(i in seq(1:100000)){
  r1 = rnorm(200,50,10)
  r2 = rnorm(200,50,10)
  p = t.test(r1,r2,var.equal = TRUE)$p.value #calculate a pval
  plist = c(p,plist) # append the pval to the list each time
  
}

#lets plot a very specific histogram
#i want to plot my list of pvalues, but I want each bin to be 0.05 long. So we should end up with 20 bins evenly spaced from 0 to 1
h <-  hist(plist,  breaks=c(0,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,1.0))

#now I want the y axis to be %, so we will divide the count of each bar by the total count
h$density = h$counts/sum(h$counts)*100
# now lets plot it 
plot(h,freq=FALSE, main="P-values derived from 2 identical populations", xlab="P-value", ylab="Frequency (%)", col="black", border="white", labels=c("<0.05", rep(NA,19)), ylim=c(0,6) )

The histogram above represents the results from several hundred Welch tests on two samples drawn from one population. As can be seen, 5% of the p-values fall below our 0.05. This means we have a 5% chance of falsely rejecting the null hypothesis purely by chance (in fact, this is a common interpretation of what the p-value threshold means)! In a different interpretation, it means that if we ran 20 different hypotheses on samples drawn from a single population, then we would expect 1 of these 20 to be false. So, if our treatment has 0 effect, but we run many hypothesis looking for differences between treatment and control samples, then we would expect 1 in 20 results to be false positive.

Note that if we set our p-value threshold to 0.1 (rather than the usual 0.05) then this is 10% that are incorrect.

This means that we have to take action to ensure we are not generating false positives, the two main actions are:

Limit the number of hypotheses in any study to the bare minimum
apply a correction if there are more than 2 hypotheses being tested in a study

There are many corrections that exist that will reduce the chance of a false positive, often by lowering the p-value threshold. One of the harshest corrections is known as the Bonferroni correction, which is simply you original p-value threshold divided by the number of hypotheses. So if you run 5 hypotheses with a usual threshold of 0.05, you get \(0.05/5 = 0.01\). You would then use 0.01 as your new p-value threshold.

Remember also that our p-value threshold (aka alpha) also forms part of our power calculation. This means that after the Bonferonni correction you need to recalculate power but with an alpha set to the new value.

multiple hypothesis will drastically reduce a study’s power

cohen_d <- function(d1,d2) {  
  
  m1 <- mean(d1, na.rm=TRUE)
  m2 <- mean(d2, na.rm=TRUE)
  s1 <- sd(d1, na.rm=TRUE)
  s2 <- sd(d2, na.rm=TRUE)
  spo <- sqrt((s1**2 + s2**2)/2)
  d <- (m1 - m2)/spo
  rpb <- d / sqrt((d**2)+4)
  ret <- list("rpb" = rpb, "effectsi" = d)
  return(ret)  } 

library(pwr)

norm <- rnorm(200,40,10)
norm2 <- norm*1.1
inp <- cohen_d(norm, norm2)



#graph alpha levels against power and sample size
power_out <- c() # initialise this empty so we can append to it
power_out2 <- c() # initialise this empty so we can append to it
sample_sizes <- c(2,4,6,10,20,30,40,50,70,90,110,130)
for(i in sample_sizes){
  x <- pwr.t.test(n= i, d= inp$effectsi, power=NULL, alternative="two.sided",sig.level = 0.05)$power
  power_out <- c(power_out, x)
  x <- pwr.t.test(n= i, d= inp$effectsi, power=NULL, alternative="two.sided",sig.level = 0.01)$power
  power_out2 <- c(power_out2, x)
  
  
}

df = data.frame(sample=sample_sizes, pwr1=power_out, pwr2=power_out2)

library(ggplot2)
 ggplot(df) + geom_line(aes(x=sample, y= pwr1,colour="alpha 0.05"),size=2)  + geom_line(aes(x=sample, y= pwr2, colour="alpha 0.01"),size=2)  + ggtitle("Power and alpha") + xlab("Sample size") + ylab("Power")

The Bonferroni correction is very conservative, and highly recommended when even a single false positive would be an issue (i.e. if you are running a high risk trial, or a trial for a major program change).

Another, less harsh, correction method is to run your tests, order the significant findings by increasing p-value (so smallest p-value first) and then rank them from 1 to \(m\) (where \(m\) is the total number of tests). This is known as the Benjamini and Hochberg (1995) method. Once you have ranked the values you compute the BH value with:

\(BH = (i/m)Q\) - where \(i\) is the rank, \(m\) is the total number of tests, and \(Q\) is the false discovery rate you choose (i.e. what fraction of false positives are you comfortable with?).

The largest p-value that matches the BH value is your cutoff, any test with a p-value below that value is considered significant, and any with larger p-values non-significant.

For example, for a false discovery rate of 0.25:

# from http://rcompanion.org/rcompanion/f_01.html

library(knitr)

Input = ("
Food               Raw.p
 Blue_fish         .34
 Bread             .594
 Butter            .212
 Carbohydrates     .384
 Cereals_and_pasta .074
 Dairy_products    .94
 Eggs              .275
 Fats              .696
 Fruit             .269
 Legumes           .341
 Nuts              .06
 Olive_oil         .008
 Potatoes          .569
 Processed_meat    .986
Proteins           .042
Red_meat           .251
Semi-skimmed_milk  .942
Skimmed_milk       .222
Sweets             .762
Total_calories     .001
Total_meat         .975
Vegetables         .216
White_fish         .205
White_meat         .041
Whole_milk         .039
")

Data = read.table(textConnection(Input),header=TRUE)

Data = Data[order(Data$Raw.p),]


### Check if data is ordered the way we intended


Data$rank <- seq(1:length(Data$Raw.p))
Data$BH <- (Data$rank/length(Data$rank))*0.25
kable(Data, digits=4)

	Food	Raw.p	rank	BH
20	Total_calories	0.001	1	0.01
12	Olive_oil	0.008	2	0.02
25	Whole_milk	0.039	3	0.03
24	White_meat	0.041	4	0.04
15	Proteins	0.042	5	0.05
11	Nuts	0.060	6	0.06
5	Cereals_and_pasta	0.074	7	0.07
23	White_fish	0.205	8	0.08
3	Butter	0.212	9	0.09
22	Vegetables	0.216	10	0.10
18	Skimmed_milk	0.222	11	0.11
16	Red_meat	0.251	12	0.12
9	Fruit	0.269	13	0.13
7	Eggs	0.275	14	0.14
1	Blue_fish	0.340	15	0.15
10	Legumes	0.341	16	0.16
4	Carbohydrates	0.384	17	0.17
13	Potatoes	0.569	18	0.18
2	Bread	0.594	19	0.19
8	Fats	0.696	20	0.20
19	Sweets	0.762	21	0.21
6	Dairy_products	0.940	22	0.22
17	Semi-skimmed_milk	0.942	23	0.23
21	Total_meat	0.975	24	0.24
14	Processed_meat	0.986	25	0.25

In the above table, we can see that Proteins has a raw p-value of 0.042 and a BH of 0.05, the next entry Nuts has a p of 0.06 and a BH of 0.06. Thus Nuts is the first case where p> (i/m)Q. And therefore we will take everything from Total_calories to Proteins to be significant, and everything else non-significant.

Note that as we set our false discovery rate to 0.25. we would expect one quarter of the hypotheses above to be false positives. Therefore this method is best used when you have more room for error.

2.2 Confidence intervals

We have discussed confidence intervals throughout the AMP, without really pausing to consider what they are. The most common interpretation is:

Confidence intervals give a range of values that have X% probability of containing the true population value

So a 95% confidence intervals indicates that there is a 95% chance that our population mean lies within the range given.

2.3 Tukey’s test

A useful hypothesis test we have not yet introduced is Tukey’s range test. This test is akin to running multiple, pairwise, t-tests across all combinations of factors and correcting for the multiple comparison problem. Note that as it uses the t-distribution, it suffers the same assumptions as the t-test.

2.4 Random block design

The Randomized complete block design (BCBD) is commonly used in agricultural trials, it is similar to an RCT but accounts for variation due to spatial effects in a field (e.g. different soil properties, drainage, etc).

A field is divided into blocks, each of which will contain each treatment and control scenario exactly once. This means our model will look something like:

\(Y_{ij} = c + T_i + B_j + e\)

Where Y is the outcome variable for treatment \(i\) and block \(j\). c is the intercept term. \(T_i\) is treatment \(i\), \(B_j\) is block \(j\) and e is random error. Hopefully at this point the similarities between the randomized block formula and mixed effect model formula jump out (see AMP lesson 3), and it is clear that we can use a mixed effect model to account for the random effects of blocks.

So in the nlme package, we would have:

lme(fixed=y~ Treatment + A + B,data=data, random=~1|Block,na.action=na.exclude,method="ML")

There is a great overview of the randomized block design here that I recommend for delving into the theory.

2.5 Advanced thoughts on statistical power

So far in the AMP, we have considered statistical power simply as the probability of producing a false negative result. I want to add some nuance now and explain the other ways that low power affects a study:

Positive predictive value
- The PPV of a study is the number of positives obtained that we expect to be true positives. The lower the power of a study, the lower the probability that an observed effect that passes the required threshold of claiming its discovery ( such as p < 0.05) actually reflects a true effect. In other words. low power affects not just your false negatives, but also your false positives. With low power, you have a higher chance of finding a false positive result. We can calculate the PPV of a discovery as:
- \(PPV = ( [P] x R)/ [P] x R) + \alpha\) - where \(P\) is power, \(R\) is the underlying likelihood of a non-null effect, and \(\alpha\) is our p-value threshold.
- For example, suppose we run a trial of 5 hypotheses, and the pre-study odds are that 1/5 hypotheses have non-null effects (\(R= 1 / (5-1) = 0.25\)), our \(\alpha\) is set to 0.05 and our power is 0.2. In this case our PPV is \(PPv = (0.2 x 0.25)/(0.2 x 0.25 + 0.05) = 0.5\). That is, 50% of our “significant” hypotheses will actually be incorrect! With a power of 0.8, then 80% of our claimed discoveries will actually be correct.
- This makes low power even more dangerous - a low power means you are not only likely to miss significant effects, but the effects you do find are much more likely to actually be false!
Winner’s curse: The winner’s curse refers to the phenomenon whereby the ‘lucky’ analyst who makes a discovery is cursed by finding an inflated estimate of that effect. The winner’s curse occurs when thresholds, such as statistical significance, are used to determine the presence of an effect and is most severe when thresholds are stringent and studies are too small and thus have low power.

We can demonstrate Winner’s curse below:

library(pwr)
pvals = c()
dvals = c()
pw = c()
for(i in seq(1,1000)){
  d1 = rnorm(100,100,10)
  d2 = rnorm(100,102,10)
  pvals = c(pvals,t.test(d1,d2)$p.value)
  dvals = c(dvals,cohen_d(d1,d2)$effectsi)
  pw = c(pw,pwr.t.test(n = 100, d= cohen_d(d1,d2)$effectsi, power=NULL, type="two.sample")$power)

}

co = round(mean(df$d),3)

df = data.frame(power=pw, p=pvals, d=dvals)

df_sub = subset(df, p<0.05)

Note that we have set the simulation to have an effect size of 2 units, or a Cohen’s D of NA. However, if we run the experiment many times, and report the average effect size for significant findings we get -0.3703037. Or an over-estimate of the effect size by a factor of NAx.

This occurs because of the frequentist nature of statistics; if a true effect is medium-sized, and a low-powered trial is run to estimate it’s effect size, then by nature the only way a hypothesis will return as “significant” is if, by chance, those particular samples show a large-sized effect (and thus pass the p-value threshold).

In this way, low power leads to several statistical crises:

Low chance of detecting an effect
high chance of detected effects being wrong
high chance to severely over-estimate the size of an effect

Takehome? It’s better to run much fewer, but higher powered studies, than many low-powered studies that lead to false, and potentially harmful reccomendations to our farmers.

3 Home made R functions used

We have implemented several home-made R functions in the AMP, I have summarised them here for ease:

4 Cohen’s D function

Calculating Cohen’s D:

Cohen’s D is a standardised measure of effect size, or difference, between treatment and control groups, with units of stdevs.

cohen_d <- function(d1,d2) {  
  m1 <- mean(d1, na.rm=TRUE)
  m2 <- mean(d2, na.rm=TRUE)
  s1 <- sd(d1, na.rm=TRUE)
  s2 <- sd(d2, na.rm=TRUE)
  spo <- sqrt((s1**2 + s2**2)/2)
  d <- (m1 - m2)/spo
  rpb <- d / sqrt((d**2)+4)
  ret <- list("rpb" = rpb, "effectsi" = d)
  return(ret)  } 


#fake data
dt1 = rnorm(100,100,10)
dt2 = rnorm(100,150,10)

#use like this
cohen <- cohen_d(dt1, dt2)

5 Calculate non-parametric power

Calculating non-normal (non-parametric) powers:

MCpower = function(sample1, sample2, size, alpha=0.05) {
  
  reps = 1000
    results  <- sapply(1:reps, function(r) {
        resample1 <- sample(sample1, size=size, replace=TRUE) 
        resample2 <- sample(sample2, size=size, replace=TRUE) 
        test <- wilcox.test(resample1, resample2, alternative="two.sided",paired=FALSE, correct=TRUE, exact=FALSE)
        test$p.value
    })
    sum(results<alpha)/reps
}



#fake some data
#control
c1 <- rnorm(10, 0.5,0.2)
#lower estimate of treatment
tl<- c1 *1.15

#we use it like this
Non_norm_power <- MCpower(c1, tl, 100)

Non_norm_power

## [1] 0.524

6 Power curve for parametric (normal) data

new function

Pretty power plot for normal distributions:

plot_normal_power = function(control, treat_lower_estimate, treat_upper_estimate){
  
power_lw <- c() # initialise this empty so we can append to it
power_hi <- c() # initialise this empty so we can append to it
sample_sizes <- c(10,20,30,40,50,70,90,100,125,150,175,200,250,300,350,400,450,500,550,600,700)
for(i in sample_sizes){
  
  lower_cohen <- cohen_d(control, treat_lower_estimate)
  a <- pwr.t.test(d = lower_cohen$effectsi , n=i,  sig.level = 0.05, power = NULL)$power
  power_lw <- c(power_lw, a)
  
  upper_cohen <- cohen_d(control, treat_upper_estimate)
  b <- pwr.t.test(d = upper_cohen$effectsi , n=i,  sig.level = 0.05, power = NULL)$power
  power_hi <- c(power_hi, b)
  
}


marker <- round(pwr.t.test(d = lower_cohen$effectsi , n=NULL,  sig.level = 0.05, power = 0.8)$n)
marker2 <- round(pwr.t.test(d = upper_cohen$effectsi , n=NULL,  sig.level = 0.05, power = 0.8)$n)
ggplot() + geom_ribbon(aes(x=sample_sizes, ymin= power_lw, ymax=power_hi), alpha=0.2, colour="blue", fill="blue")  + xlab("Sample size") + ylab("Power") + geom_vline(xintercept = marker,size=1.3 ) + geom_hline(yintercept=0.8, linetype="dotted", size=1.5) + geom_vline(xintercept = marker2 ,size=1.3) + labs(title="Parametric power curve ", caption="Power curve indicating sample sizes needed for a 0.8 power (dotted lines)") + theme(plot.caption = element_text(hjust = 0.5)) +  geom_text(aes(x=marker, label=paste("\n N=", marker), y=0.5), colour="blue", angle=90, text=element_text(size=11)) +
  geom_text(aes(x=marker2, label=paste("N=",marker2,"\n"), y=0.5), colour="red", angle=90, text=element_text(size=11)) 
}


#fake some data
#control
c1 <- rnorm(100, 0.5,0.2)
#lower estimate of treatment
tl<- c1 *1.15
#upper estimate of treatment
tu <- c1 *1.25



plot_normal_power(c1,tl,tu)

Note that the dotted horizontal line indicates a power of 0.8, and the vertical lines indicates sample size estimates for upper and lower treatment effect estimates.

7 Power curve for non-parametric data

New function

Pretty power curves for non-parametric data

MCpower_curve = function(control, lower_estimate, upper_estimate, alpha=0.05){
  
  
  power_lw = c() # initialise this empty so we can append to it
  power_hi = c() # initialise this empty so we can append to it
  sample_sizes <- c(10,25,50,75,100,125,150,175,200,225,250,300,350,400,450,500,600,700)
  for(i in sample_sizes){
    x = MCpower(control,lower_estimate,size=i, alpha=alpha)
    power_lw = c(power_lw,x)
    x1 = MCpower(control,upper_estimate,size=i, alpha=alpha)
    power_hi = c(power_hi,x1)
    #ifelse( x < 0.8, print(paste(i,"=",x,x1)),0)
    }


  # crude way to get lowest N
  df1 = data.frame(samps = sample_sizes, "pw" = power_hi)
  df2 = data.frame(samps = sample_sizes, "pw" = power_lw)
  
  df1 = subset(df1, pw >=0.8)
  df2 = subset(df2, pw >=0.8)


  marker = ifelse(length(df1$pw)>0, df1$samps[1],0)
  marker2 = ifelse(length(df2$pw)>0, df2$samps[1],0)
  print(ggplot() + geom_ribbon(aes(x=sample_sizes, ymin= power_lw, ymax=power_hi), alpha=0.2, colour="blue", fill="blue")  + xlab("Sample size") + ylab("Power") + geom_vline(xintercept = marker,size=1.3 ) + geom_hline(yintercept=0.8, linetype="dotted", size=1.5) + geom_vline(xintercept = marker2 ,size=1.3) + labs(title="Non-parametric power curve ", caption="Power curve indicating sample sizes needed for a 0.8 power (dotted lines)") + theme(plot.caption = element_text(hjust = 0.5)) +  geom_text(aes(x=marker, label=paste("\n N=", marker), y=0.5), colour="blue", angle=90, text=element_text(size=11)) +
  geom_text(aes(x=marker2, label=paste("N=",marker2,"\n"), y=0.5), colour="red", angle=90, text=element_text(size=11)) )

  library(knitr)  
  
  kable(subset(data.frame(sample_size=sample_sizes, "upper_pw"=power_hi, "lower_pw"=power_lw), lower_pw <0.81 )   )
  
}

#fake some data
#control
c1 <- rnorm(1000, 0.5,0.2)
#lower estimate of treatment
tl<- c1 *1.1
#upper estimate of treatment
tu <- c1 *1.15



MCpower_curve(c1,tl,tu)

sample_size	upper_pw	lower_pw
10	0.103	0.073
25	0.191	0.111
50	0.364	0.215
75	0.517	0.287
100	0.624	0.351
125	0.725	0.404
150	0.804	0.515
175	0.841	0.545
200	0.897	0.604
225	0.925	0.669
250	0.959	0.715
300	0.977	0.761

Likewise - vertical bars indicate sample sizes needed for a 0.8 power at lower and upper effect estimates.

8 Randomize dataframes

The function below will take a dataframe and randomly assign rows (which could be sites, or individual farmers) to “treatment” or “control”

#lets first make up a fake list of farmer IDS from 1 to 1000 and 1000 random variables 
df = data.frame("OAFID"=seq(1,1000), "yield"=rnorm(1000))

#lets now make a function to do the work - you can copy paste this funciton into your own scripts

RCT_random = function(dataframey, values_to_add){
  
  set.seed(111)
  dataframey$values_to_add[sample(1:nrow(dataframey), nrow(dataframey), FALSE)] <- rep(values_to_add)
  colnames(dataframey)[which(colnames(dataframey)=="values_to_add")] = "Status"
  return(dataframey) }


# so this will take the dataframe called "df" and randomly assign each ROW to "Treatment" or "control"
df_new = RCT_random(df, c("Treatment","Control"))

kable(head(df_new, 10))

OAFID	yield	Status
1	0.6745768	Treatment
2	-0.4235405	Treatment
3	-0.7418825	Control
4	0.4375274	Treatment
5	-1.1139436	Control
6	1.7375220	Control
7	0.4059804	Control
8	-0.5534344	Control
9	-1.3451720	Control
10	-0.6603885	Treatment

9 Calculate intra cluster correlation (ICC)

This function will take multi-level data, e.g.

df = data.frame(FarmerID=seq(1,100), Maize_yield=rnorm(100,500,50),  group_name=c("farmers_first","farmers_second","farmers_third","farmers_last"), District=c("A","B"))

library(knitr)
kable(df[1:5,], format="markdown", align="c")

FarmerID	Maize_yield	group_name	District
1	569.7911	farmers_first	A
2	455.4007	farmers_second	B
3	553.2515	farmers_third	A
4	540.0405	farmers_last	B
5	481.5963	farmers_first	A

And calculate the ICC value and it’s significance:

ICC_CI <- function(cluster_level,outcomevar, dataf){
  
  #load library
require(ICC)
  set.seed(123)
  si = round(dim(dataf)[1]*0.66)
  values_f <- c()
  for(i in seq(1:50)){
  samp_f = dataf[sample(nrow(dataf), si), ]
  x_f = ICCbare(cluster_level,outcomevar,samp_f)
  values_f <- c(values_f, x_f)
  }
  # note that 1.96StDevs = 95% confidence interval bounds in a normal dist.
   ret = data.frame("Mean ICC" = round(mean(values_f, na.rm=TRUE),3), "CI" = round(1.96*sd(values_f, na.rm=TRUE),3))
   ret$Significant = ifelse(ret$Mean.ICC > ret$CI, "Y", "N")
  return( ret)
  
  }
#using kable to make the dataframe that is returned look pretty
stored_ICC <- ICC_CI("group_name", "Maize_yield", df)

stored_ICC

##   Mean.ICC    CI Significant
## 1    0.022 0.081           N

10 ICC sample size correction

This function will take in an original sample size (calculated from the functions above) and return a sample size that accounts for ICC effects

ICC_correction <- function(samplesize, num_clusters, ICC_estimate){
  
  
  
  average_cluster_size = samplesize/num_clusters
  
  factor_inflate = 1 + (average_cluster_size - 1) * ICC_estimate
  
  return(data.frame("New sample size"=round(samplesize*factor_inflate), "Old sample size"=samplesize   ))
  
  
}

ICC_correction(200, 50, 0.2)

##   New.sample.size Old.sample.size
## 1             320             200

11 Unique or NA

This is a simple function that will, when given a list, will return either a single value (if the list is identical) or an NA. Useful for piping

#new function will return a unique name or a NA if there is more than one unique value
uniqueorna <- function(x) {
  if (length(unique(x)) == 1) {
    unique(x)[1]
  } else {
    NA_character_
  }
}

#useful in piping
library(dplyr)

#summarise factor columns to get unique or NA returned
"cdat <- dat[c("SiteName","DistrictName","Treatment")] %>%  
  group_by(SiteName) %>%  
    summarise_if(.predicate=is.factor, .funs=uniqueorna)""

12 Concise and pretty bar plots

The following function (nicebars)will plot treatment/control data and summary statistics (p-values and powers):

nicebars requires three arguments:

varname - the name of the variable you want to plot
dataframe1 - the dataframe to plot from
treatname - the variable indicating treatment/control

And is called like this nicebars("Xvariable of interest", dataframename, "column indicating treatment").

nicebars <- function(varname, dataframe1, treatname){
  
 
  dataframe1$temp = unlist(dataframe1[varname])
  
    dataframe1$treated = unlist(dataframe1[treatname])
  t1 =  unique(dataframe1$treated)[1]
  t2 =  unique(dataframe1$treated)[2]
  
  colnames(dataframe1)[which(colnames(dataframe1)==treatname)] = "treatname"
  tp = MCpower( subset(dataframe1,treated==t1)$temp , subset(dataframe1,treated==t2)$temp , size= length(subset(dataframe1, treated==t2)$temp ))
  

  
  gx = ggplot(dataframe1, aes(treatname,temp)) + stat_summary(fun.y = mean, geom = "bar") + stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width=0.25) + ggtitle(varname)  
  gx = add_sub(gx, paste("stat power ~ ",round(tp,3),"\n","Wilcoxon P-value:",format(wilcox.test(subset(dataframe1, treated==t1)$temp, subset(dataframe1, treated==t2)$temp  ,paired=FALSE, correct=TRUE, exact=FALSE)$p.value,scientific = TRUE,digits=3) )) 
  library(cowplot)
plot_grid(gx,ncol =1, nrow = 1)
}

#fake data
dat = data.frame(ID = seq(1,1000))

df_new = RCT_random(dat, c("Treatment","Control"))

df_new$Xvar = rnorm(1000,100,100)

df_new$Xvar[df_new$Status=="Treatment"] = df_new$Xvar[df_new$Status=="Treatment"]*1.1 + rnorm(1000,0,20)

nicebars("Xvar", df_new, "Status")

13 Minimum detectable effect (MDE) calculation

This function calculates the MDE we would expect for a 0.8 power, see AMP lesson 2 for an MDE refresher

posthoc_mde = function(d1,d2){
  
  require(pwr)
  
  #print(paste("mean diff (d2-d1)", mean(d2, na.rm=TRUE)-mean(d1,na.rm=TRUE)))
  s1= sd(d1,na.rm=TRUE)
  s2=sd(d2,na.rm=TRUE)
  spo = sqrt((s1**2 + s2**2)/2)
  mded = pwr.t.test(n = length(d1), d=NULL, sig.level = 0.05, power=0.8, type="two.sample", alternative = "two.sided")$d
  mde_out = mded * spo
  return(mde_out)
}


subs1 = subset(df_new, Status=="Treatment")
subs2 = subset(df_new, Status=="Control")

mde1 = posthoc_mde(subs1$Xvar, subs2$Xvar)
mde1

## [1] 18.46855

Remember the units are the same as whatever you gave the function!

14 Plot linear mixed effect model diagnostics

This function will produce some handy graphs for diagnosing the effectiveness of a linear mixed effects model

# a function to plot some lme diagnostics. 
plot_lme = function(fob,realdata,df, namey){
  
  
  # FOB = model object
  # real data is a vector of values
  # df is the dataframe
  # namey is the names of any other key variables
  
  #realdata = dat$TotalCredit
  #stnd residuals
  stdr = residuals(fob)/ sd(residuals(fob))
  sqres = sqrt(stdr**2)
  
  par(mfrow=c(2,3))
  
  qqnorm(stdr, ylab='Standardized residuals', main="residuals QQplot")
  qqline(stdr)
  
  # fitted vs actuals
  
  ran = ranef(fob)

#plot(fitted(fob), realdata, main="fitted vs actual",xlab="Fitted",ylab="actual")
  plot(ran$`(Intercept)`, seq(1:length(ran$`(Intercept)`)), main="random effects", ylab="Site number", xlab="Random effect")
  plot
# fitted vs resid
  plot(fitted(fob), stdr,  main="fitted vs residuals",xlab="Fitted",ylab="standardized residuals")

# estimate leverage
  ma = as.matrix(df[,c(namey)])
  #ma = as.matrix(dat[,c('TotalEnrolledSeasons', 'Maize.adoption','Maize.acres')])
  #rm(hatty)
  # want X * (Xt * X)^-1 * X
  hatty = ma %*% solve(t(ma) %*% ma) %*% t(ma)  
  #plot leverage against std.resid.
  plot(diag(hatty), stdr, main="Leverage-residuals",xlab="est.Leverage",ylab="residuals")
  
  res_lme=residuals(fob)
  plot(res_lme, main="Residuals vs index", ylab="resid")
  abline(a=0,b=0, lwd=4, col="red")
  
  
  qqnorm(ranef(fob)$`(Intercept)` , main="Random effects QQplot")
  qqline(ranef(fob)$`(Intercept)` )
  
}


#generate data
set.seed(123)

 
indivs <- 12 #aka J
clusters <- 75 #aka N
 
tdf <- data.frame( Cluster = sort(rep(c(1:clusters),indivs)),        Indiv = rep(c(1:indivs),clusters) , x = rnorm(n = indivs*clusters,0,5) )

#we now have X

# lets get intercept diff
inte <- rnorm(clusters,0,50)
ntdf = c()
intey = 25 + inte

#assign distinct beta to each cluster
for(i in seq(1:length(unique(tdf$Cluster))) ) {
  
  cc = unique(tdf$Cluster)[i]
  val = inte[i]
  #print(paste(i,cc,val))
  
  tx = subset(tdf, Cluster==cc)
  tx$inte = val
  
  ntdf = rbind(tx,ntdf)
  
  
}

# y = intercept + x * beta + iid noise
ntdf$y <- 25 + ntdf$inte + (ntdf$x * rnorm(clusters*indivs,3,0.5)) + rnorm(clusters*indivs,0,15)

d = ntdf


#gen model

library(nlme)
mixmo <- lme(fixed=y~ x ,data=d, random=~1|Cluster,na.action=na.exclude,method="ML")



plot_lme(mixmo, d$y, d, c("x"))

15 Some required reading

At the end of the AMP, I wanted to recommend some further reading. The first part of this reading list contains example analyses from social science/economics fields that will illustrate the quality level we are aiming for. Note that these analyses are very thorough and professional, and yet very concise. I would like every AMPee to imitate the style of these papers:

Suri et al. - The long-run poverty and gender impacts of mobile money

Finally, here are some resources for self-teaching, whilst the AMP so far has aimed to be math-lite, to be a proper analyst/statistician you will need to learn some maths (algebra, matrices, graph transformations and calculus mostly):

AMP - Wrap up

Refining key concepts, and summarizing key functions.

Michael Barber

Dec 19th, 2017