Coincidence Analysis (CNA): Reproducing Baumgartner and Epple 2014 in R

read

To learn more about Coincidence Analysis (CNA), I looked through the article by Baumgartner and Epple on A Coincidence Analysis of a Causal Chain: The Swiss Minaret Vote to see how they apply the method of CNA, described in Baumgartner 2009 (Inferring Causal Complexity ). Since I was trying to reproduce the calculations in R, I decided to share my efforts. As always, comments are most welcome.

Data

I re-created the raw data-set to explain the 2009 Swiss Minaret vote from the truth table set in Table 1 in the article.

data <- read.csv(header = T, text =
"Canton, A, L, S, T, X, M
LU,1,0,1,1,1,1
UR,1,0,1,1,1,1
SZ,1,0,1,1,1,1
OW,1,0,1,1,1,1
NW,1,0,1,1,1,1
AR,1,0,1,1,1,1
AI,1,0,1,1,1,1  
GL,1,0,1,0,1,1
ZG,1,0,1,0,1,1
SO,1,0,1,0,1,1
SG,1,0,1,0,1,1
AG,1,0,1,0,1,1
VD,0,1,0,0,0,0
NE,0,1,0,0,0,0
GE,0,1,0,0,0,0
GR,0,0,1,1,1,1
TG,0,0,1,1,1,1
ZH,1,1,1,0,1,1
BE,1,1,1,1,1,1
FR,1,0,0,1,0,1
BS,1,1,0,0,0,0
BL,1,1,0,0,1,1
SH,0,1,1,0,1,1
TI,0,0,0,0,1,1
VS,0,0,0,1,0,1
JU,0,1,0,1,0,1")

row.names(data) <- data$Canton
data$Canton <- NULL

The abbreviations stand for: <ul> <li> M the vote of Swiss cantons to ban minarets </li> <li> A high rate of old xenophobia </li> <li> T traditional economic structure </li> <li> L strong left parties </li> <li> S high share of native Serbian, Croatian, Albanian speakers </li> <li> X high rate of new xenophobia </li> </ul>

Here’s the reproduced truth table, created with the cna-Package:

library(cna)
tt <- truthTab(data)
tt

##  A L S T X M no.of.cases                cases
##  1 0 1 1 1 1           7 LU,UR,SZ,OW,NW,AR,AI
##  1 0 1 0 1 1           5       GL,ZG,SO,SG,AG
##  0 1 0 0 0 0           3             VD,NE,GE
##  0 0 1 1 1 1           2                GR,TG
##  1 1 1 0 1 1           1                   ZH
##  1 1 1 1 1 1           1                   BE
##  1 0 0 1 0 1           1                   FR
##  1 1 0 0 0 0           1                   BS
##  1 1 0 0 1 1           1                   BL
##  0 1 1 0 1 1           1                   SH
##  0 0 0 0 1 1           1                   TI
##  0 0 0 1 0 1           1                   VS
##  0 1 0 1 0 1           1                   JU
## Total no.of.cases: 26

CNA

First, A and T are labeled as exogenous factors. Thus, only factors L, S, X, M are endogenous and could be considered as effects in the CNA. Further, because of temporal ordering, M cannot cause L, S, and X.

In sum, the authors propose the following causal order of factors:

A, T < L, S < X < M

#devtools::install_github('christophergandrud/d3Network')
library(d3Network)
Source <- c("old xenophobia and trad econ structures", "strong left and native foreign speakers", "new xenophobia")
Target <- c("strong left and native foreign speakers", "new xenophobia", "minaret ban")
NetworkData <- data.frame(Source, Target)
d3SimpleNetwork(NetworkData, width = 500, height = 400, file="network1.html", iframe = T, fontsize = 15)

Sufficient Conditions

The authors look for sufficient conditions for L first. As discussed above, only A, T, and S are considered as candidates for sufficient conditions of L.

library(dplyr)
suff.cond <- msc(cna(select(data, A, T, S, L)))
filter(suff.cond, outcome == "L")

##   outcome  condition consistency coverage
## 1       L A*s*t -> L           1   0.2222
## 2       L a*S*t -> L           1   0.1111
##

Next, look at sufficient conditions for S. Only A, T, and L are allowed:

suff.cond <- msc(cna(select(data, A, T, S, L)))
filter(suff.cond, outcome == "S")</code></pre>

##   outcome  condition consistency coverage
## 1       S A*l*t -> S           1  0.29412
## 2       S A*L*T -> S           1  0.05882
##

Next, look at sufficient conditions for X. All factors despite M are allowed:

suff.cond <- msc(cna(select(data, A, T, S, L, X)))
filter(suff.cond, outcome == "X")

##   outcome  condition consistency coverage
## 1       X     S -> X           1  0.89474
## 2       X   l*t -> X           1  0.31579
## 3       X A*L*T -> X           1  0.05263</code></pre>
##

Next, look at sufficient conditions for M:

suff.cond <- msc(cna(data))
filter(suff.cond, outcome == "M")

##   outcome condition consistency coverage
## 1       M    X -> M           1   0.8636
## 2       M    l -> M           1   0.7727
## 3       M    S -> M           1   0.7727
## 4       M    T -> M           1   0.5909
##

Atomic solution formulas

Next, the authors combine the minimally sufficient conditions to produce atomic solution formulas. In order to reproduce the formulas on page 292, the cna function needs to be adjusted to allow lower coverage values.

sol.form <- asf(cna(select(data, A, T, S, L), cov = 0.3))
filter(sol.form, outcome == "L")

##   outcome           condition consistency coverage
## 1       L a*S*t + A*s*t <-> L           1   0.3333
##

sol.form <- asf(cna(select(data, A, T, S, L), cov = 0.3))
filter(sol.form, outcome == "S")

##   outcome           condition consistency coverage
## 1       S A*l*t + A*L*T <-> S           1   0.3529
##

sol.form <- asf(cna(select(data, A, T, S, L, X), cov = 0.9))
filter(sol.form, outcome == "X")

##   outcome     condition consistency coverage
## 1       X l*t + S <-> X           1   0.9474
##

Here, the asf function already returns the reduced version of equation (3), i.e. equation (5) on page 293.

sol.form <- asf(cna(data, cov = 1))
filter(sol.form, outcome == "M")

##   outcome   condition consistency coverage
## 1       M T + X <-> M           1        1
##

Again, the function already produces the reduced version.

The authors argue that, due to the weak coverage values, the atomic solution formulas for L and S cannot be meaningfully interpreted and that this is probably a sign of omitted variable bias. Both factors are thus considered to be exogenous. X and M are to be explained.

Next, the authors analyze which parts of the solution formulas are redundant, i.e. add little to the explanation of either M or X:

## showing that A*L*T is redundant for explaining X, no coverage drop
## see equations 5:7 in paper
print(condition("A*L*T + l*t + S -> X", truthTab(select(data, !M))), print.table =F)

## A*L*T+l*t+S -> X :
## Consistency: 1.000 (18/18)
## Coverage:    0.947 (18/19)
## Total no. of cases: 26
## Unique Coverages: A*L*T : 0.000 (0/19)
##                   l*t   : 0.053 (1/19)
##                   S     : 0.579 (11/19)

print(condition("l*t + S -> X", truthTab(select(data, !M))), print.table =F)

## l*t+S -> X :
## Consistency: 1.000 (18/18)
## Coverage:    0.947 (18/19)
## Total no. of cases: 26
## Unique Coverages: l*t : 0.053 (1/19)
##                   S   : 0.632 (12/19)

print(condition("A*L*T + S -> X", truthTab(select(data, !M))), print.table =F)

## A*L*T+S -> X :
## Consistency: 1.000 (17/17)
## Coverage:    0.895 (17/19)
## Total no. of cases: 26
## Unique Coverages: A*L*T : 0.000 (0/19)
##                   S     : 0.842 (16/19)

print(condition("A*L*T + l*t -> X", truthTab(select(data, !M))), print.table =F)

## A*L*T+l*t -> X :
## Consistency: 1.000 (7/7)
## Coverage:    0.368 (7/19)
## Total no. of cases: 26
## Unique Coverages: A*L*T : 0.053 (1/19)
##                   l*t   : 0.316 (6/19)

## showing that l and S are redundant for explaining M
## see equations 8:12 in paper
print(condition("l + S + T + X -> M", truthTab(data)), print.table =F)

## l+S+T+X -> M :
## Consistency: 1.000 (22/22)
## Coverage:    1.000 (22/22)
## Total no. of cases: 26
## Unique Coverages: l : 0.000 (0/22)
##                   S : 0.000 (0/22)
##                   T : 0.045 (1/22)
##                   X : 0.045 (1/22)

print(condition("l + S + X -> M", truthTab(data)), print.table =F)

## l+S+X -> M :
## Consistency: 1.000 (21/21)
## Coverage:    0.955 (21/22)
## Total no. of cases: 26
## Unique Coverages: l : 0.091 (2/22)
##                   S : 0.000 (0/22)
##                   X : 0.045 (1/22)

print(condition("l + S + T -> M", truthTab(data)), print.table =F)

## l+S+T -> M :
## Consistency: 1.000 (21/21)
## Coverage:    0.955 (21/22)
## Total no. of cases: 26
## Unique Coverages: l : 0.045 (1/22)
##                   S : 0.091 (2/22)
##                   T : 0.045 (1/22)

print(condition("T -&gt; M", truthTab(data)), print.table =F)

## T -> M :
## Consistency: 1.000 (13/13)
## Coverage:    0.591 (13/22)
## Total no. of cases: 26

print(condition("X -> M", truthTab(data)), print.table =F)

## X ->; M :
## Consistency: 1.000 (19/19)
## Coverage:    0.864 (19/22)
## Total no. of cases: 26
##

Thus, in combination, the paper proposes the following solution formula:

(l*t + S -> X) * (T + X -> M)

This can also be reproduced by setting the cna coverage threshold to 0.947.

csf(cna(data, cov = 0.947))

##                               condition consistency coverage
## 1     (T + X <-> M) &   (M*t + S <-> X)       1.000    1.000
## 2 (l + S + T <-> M) &   (M*t + S <-> X)       1.000    0.955
## 3     (l + X <-> M) &   (M*t + S <-> X)       1.000    0.955
## 4     (T + X <-> M) & (A*L*M + S <-> X)       1.000    0.947
## 5 (l + S + T <-> M) & (A*L*M + S <-> X)       1.000    0.947
## 6     (l + X <-> M) & (A*L*M + S <-> X)       1.000    0.947
## 7     (T + X <-> M) &   (l*t + S <-> X)       1.000    0.947
## 8 (l + S + T <-> M) &   (l*t + S <-> X)       1.000    0.947
## 9     (l + X <-> M) &   (l*t + S <;-> X)       1.000    0.947
##

Following the exclusion criteria above, solutions 1:6 need to be excluded because X > M. The explanatory parts for M in solutions 8 and 9 have been excluded above. Thus, solution formula 7 is accepted.

Source <- c("weak left parties AND non-traditional economic sector","many native foreign speakers", "new xenophobia", "trad econ structures")
Target <- c("new xenophobia", "new xenophobia", "minaret ban", "minaret ban")
NetworkData <- data.frame(Source, Target)
d3SimpleNetwork(NetworkData, width = 500, height = 400, file="network2.html", iframe = T, fontsize = 15)

QCA

In a next step, the authors try to reproduce the results from the CNA, especially the chained nature of causality with new xenophobia as an intermediary, with QCA. Here, I try to reproduce the results with the qca package.

Sufficient conditions

As also discussed in the text, without the inclusion of logical remainders in the minimization, QCA lists a large amount of partial explanations (see equation 14):

library(QCA)
qca.table &lt;- truthTable(data, outcome = "M", show.cases = TRUE, sort.by= "n")
eqmcc(qca.table, details=TRUE, show.cases = TRUE)

##
## n OUT = 1/0/C: 22/4/0
##   Total      : 26
##
## Number of multiple-covered cases: 9
##
## M1: ASX + ALtX + asTx + lsTx + lSTX + LStX + alstX <=> M
##
##           incl   cov.r  cov.u  cases
## ---------------------------------------------------------------------------
## 1  ASX    1.000  0.636  0.273  LU,UR,SZ,OW,NW,AR,AI; GL,ZG,SO,SG,AG; ZH; BE
## 2  ALtX   1.000  0.091  0.045  BL; ZH
## 3  asTx   1.000  0.091  0.045  VS; JU
## 4  lsTx   1.000  0.091  0.045  VS; FR
## 5  lSTX   1.000  0.409  0.091  LU,UR,SZ,OW,NW,AR,AI; GR,TG
## 6  LStX   1.000  0.091  0.045  SH; ZH
## 7  alstX  1.000  0.045  0.045  TI
## ---------------------------------------------------------------------------
##    M1     1.000  1.000
##

The same can be done for X only (see equation 15 in the paper):

qca.table.x &lt;- truthTable(data, outcome = "X", conditions = c("A", "L", "S", "T"), show.cases = TRUE, sort.by= "n")
eqmcc(qca.table.x, details=TRUE, show.cases = TRUE)

##
## n OUT = 1/0/C: 18/8/0
##   Total      : 26
##
## Number of multiple-covered cases: 8
##
## M1: AS + lST + LSt + alst => X
##
##          incl   cov.r  cov.u  cases
## --------------------------------------------------------------------------
## 1  AS    1.000  0.737  0.316  LU,UR,SZ,OW,NW,AR,AI; GL,ZG,SO,SG,AG; ZH; BE
## 2  lST   1.000  0.474  0.105  LU,UR,SZ,OW,NW,AR,AI; GR,TG
## 3  LSt   1.000  0.105  0.053  SH; ZH
## 4  alst  1.000  0.053  0.053  TI
## --------------------------------------------------------------------------
##    M1    1.000  0.947
##

With logical remainders, the number of solutions in reduced:

qca.table.full <- truthTable(data, outcome = "M", show.cases = TRUE, sort.by= "n", complete = T)
eqmcc(qca.table.full, details=TRUE, show.cases = TRUE, include = "?")

##
## n OUT = 1/0/C: 22/4/0
##   Total      : 26
##
## Number of multiple-covered cases: 10
##
## M1: T + X <=> M
##
##        incl   cov.r  cov.u
## --------------------------
## 1  T   1.000  0.591  0.136
## 2  X   1.000  0.864  0.409
## --------------------------
##    M1  1.000  1.000
##
##        cases
## ------------
## 1  T   LU,UR,SZ,OW,NW,AR,AI; GR,TG; VS; JU; FR; BE
## 2  X   LU,UR,SZ,OW,NW,AR,AI; GL,ZG,SO,SG,AG; GR,TG; TI; SH; BL; ZH; BE
## ------------

Here, the QCA mechanism identifies the same causal link as the CNA, i.e. T + X -> M but misses the causal chain around X.

In a step-wise approach, when looking for conditions for X with logical remainders, the same result is produced.

qca.table.full <- truthTable(data, outcome = "X", conditions = c("A", "L", "S", "T"), show.cases = TRUE, sort.by= "n", complete = T)
eqmcc(qca.table.full, details=TRUE, show.cases = TRUE, include = "?")

##
## n OUT = 1/0/C: 18/8/0
##   Total      : 26
##
## Number of multiple-covered cases: 5
##
## M1: S + lt => X
##
##        incl   cov.r  cov.u
## --------------------------
## 1  S   1.000  0.895  0.632
## 2  lt  1.000  0.316  0.053
## --------------------------
##    M1  1.000  0.947
##
##        cases
## ------------
## 1  S   LU,UR,SZ,OW,NW,AR,AI; GL,ZG,SO,SG,AG; GR,TG; SH; ZH; BE
## 2  lt  GL,ZG,SO,SG,AG; TI
## ------------

Thus, the QCA confirms the results of the CNA. Yet, it only does this by introducing the analysis of logical remainders and accepting logical contradictions due to the simplifying assumptions in the process.

Coincidence Analysis (CNA): Reproducing Baumgartner and Epple 2014 in R

Data

CNA

Sufficient Conditions

Atomic solution formulas

QCA

Sufficient conditions

Written by

Tobias Weise

Supported by

Tobias Weise

university management specialist