Introduction

This is a short software tutorial accompanying the paper “A Stochastic Block Prior for Clustering in Graphical Models”. The paper introduces the Stochastic Block Model as a prior on the network structure of the ordinal Markov random field (cf., Marsman et al., 2025). Apart from providing an elegant approach for incorporating the theoretical assumption of clustering directly into the statistical model, the approach comes with the methodological benefit of allowing researchers to infer the number of clusters in the network model given the data at hand, as well as to test hypotheses about clustering.

In this brief tutorial, we first illustrate how this analysis can be done using the R package bgms (Marsman et al., 2023), and then using the user-friendly wrapper package easybgm (Huth et al., 2024). We will start with the package bgms, which includes the core code to perform the analysis. Then, we will demonstrate how the same analysis can be performed using easybgm, which offers a more accessible way to run the same procedure for those without a strong background in R.

`bgms`

Currently, the version of bgms that includes the SBM prior is not yet available on CRAN. Therefore, we need to use the following code to install the stable GitHub version:

if (!requireNamespace("remotes")) { 
  install.packages("remotes")   
}   
remotes::install_github("Bayesian-Graphical-Modelling-Lab/bgms")

Now, say we wish to analyze whether the network structure inferred from our empirical dataset exhibits clustering. To do this, we estimate the most representative vector of node allocations given the data, which simply denotes the cluster to which each node is assigned, thereby providing insight into potential clustering in the network. We also wish to test hypotheses about clustering using the cluster Bayes factors. All of this can be achieved using the bgm function in the bgms package.

The first step is to estimate the model using the bgm function, setting the edge_prior argument to "Stochastic-Block". There are also additional arguments relevant to the SBM. One is the concentration parameter of the Dirichlet prior on the cluster allocation probabilities. This parameter controls the expected cluster sizes: higher values lead to more uniform cluster sizes, while lower values promote more uneven, possibly sparse cluster allocations. The default is dirichlet_alpha = 1. Another is the rate parameter (\(\lambda\)) of the truncated Poisson prior, which governs the prior on the number of clusters. This parameter reflects the expected number of clusters before observing the data; by default, lambda = 1, which implies no prior assumption of clustering. In the example code below, we use the default values for these prior hyperparameters. We also leave the default settings for the remaining function arguments. However, we strongly recommend that researchers carefully consider these values when conducting their own analyses. For more details, see, for example Marsman et al. (2023). If you use save = TRUE, refer to the subsection below for an additional step needed to retrieve the summarized results.

library(bgms) # load the bgms package 
# load the data

fit <- bgm(dataset, edge_prior = "Stochastic-Block")

Now, first we check the estimated cluster allocation vector. The output is a vector whose length equals the number of variables (columns) in the dataset. To examine this, we can inspect the allocations vector in the fit object, where the results of our analysis are stored.

fit[["allocations"]]

We can also examine the estimated posterior probabilities for all possible numbers of clusters. In the software, this ranges from 1 (i.e., no clustering) to the total number of variables in the dataset.

fit[["components"]]

These posterior probabilities are particularly useful for calculating the cluster Bayes factors, which are defined as follows:

\[\text{BF}_{10} = \underbrace{\frac{p(\mathcal{H}_1\mid \text{data})}{p(\mathcal{H}_0 \mid \text{data})}}_{\text{Posterior Odds}} \bigg/ \underbrace{\frac{p(\mathcal{H}_1)}{p(\mathcal{H}_0)}}_{\text{Prior Odds}}\]

Depending on the hypotheses, we consider two types of Bayes factors:

The Bayes factor testing the hypothesis of clustering, i.e., \(\mathcal{H}_1: B > 1\), against the null hypothesis of no clustering \(\mathcal{H}_0: B = 1\). For this, we need to calculate the posterior odds of the two hypotheses, which is simply the ratio of the posterior probabilities of the two hypotheses. For \(\mathcal{H}_1: B > 1\), we would need to sum the posterior probabilities of all possible numbers of clusters greater than 1.

H1 <- unname(fit[["components"]][-1,2]) # posterior probabilities under H1
H0 <- unname(fit[["components"]][1,2]) # posterior probabilities under H0

posterior_odds <- sum(H1) / H0

After calculating the posterior odds, we need to divide these by the prior odds in order to obtain the Bayes factor. For this first Bayes factor, the prior odds are defined as:

\[ \frac{(\exp(\lambda) - 1 - \lambda)}{\lambda}.\] Since we leave the default value for the rate parameter \(\lambda\) of the Poisson distribution, we calculate the prior odds as follows:

lambda = 1
prior_odds <- (exp(lambda) - 1 - lambda) / lambda

Finally the Bayes factor is given by

BF_10 <- posterior_odds / prior_odds
BF_10  # Bayes factor in favor of H1
1/BF_10 # Bayes factor in favor of H0

The Bayes factor testing the hypothesis of a specific number of clusters, i.e., \(\mathcal{H}_1: B = b_1\), against \(\mathcal{H}_2: B = b_2\). Say we wish to test the null hypothesis of 1 cluster against the hypothesis that assumes there are two clusters (i.e., \(b_1 = 1\) and \(b_2 = 2\)). For this, we follow similar steps as above.

H1 <- unname(fit[["components"]][1,2]) # posterior probabilities under H0 (i.e., B = 1)
H2 <- unname(fit[["components"]][2,2]) # posterior probabilities under H1 (i.e., B = 2)
posterior_odds <- H1 / H2

The prior odds for this Bayes factor are given by: \[\lambda^{b_1 - b_2} \cdot \frac{b_2!}{b_1!}\]

prior_odds <- lambda^(1 - 2) * factorial(2) / factorial(1)

Finally the Bayes factor is given by

BF_12 <- posterior_odds / prior_odds
BF_12  # Bayes factor in favor of H1
1/BF_12 # Bayes factor in favor of H2

when `save = TRUE`

We would also like to note that, if for any reason, researchers wish to save the samples from the posterior distribution, then in order to obtain the summarized posterior clustering information, they would need to use the function summarySBM as follows:

fit <- bgm(dataset, 
           edge_prior = "Stochastic-Block", 
           save = TRUE)
cluster_summerized <- summarySBM(fit)

`easybgm`

Everything that we have illustrated using bgms can be much more easily done using easybgm. Currently, the version that contains this analysis is also only available on GitHub and can be installed as follows:

if (!requireNamespace("remotes")) { 
  install.packages("remotes")   
}   
remotes::install_github("sekulovskin/easybgm", 
                        ref = "SBM_in_easybgm")

We simly perfrom the analysis as follows:

library(easybgm)

fit <- easybgm(dataset, 
               type = "ordinal",
               edge_prior = "Stochastic-Block")

Now, we do not need to perform any post-processing or dig into the objects containing the results of the analysis or calculate the Bayes factors “by hand.” We can simply use the summary function, and all the relevant output will be displayed.

summary(fit)

After the edge-specific overview (for more information, see Huth et al. (2024)), we see the cluster-specific overview, which includes the posterior probabilities for all possible numbers of clusters, the estimated node membership, as well as the Bayes factors testing the hypothesis of clustering and the null hypothesis of no clustering.

In case you wish to calculate the second type of Bayes factor, similar to the steps above, you can simply use the function clusterBayesfactor, by setting the argument type = "simple" and specifying the values for \(b_1\) and \(b_2\).

clusterBayesfactor(fit, type = "simple", b1 = 1, b2 = 2)

References

Huth, K. B. S., Keetelaar, S., Sekulovski, N., Bergh, D. van den, & Marsman, M. (2024). Simplifying bayesian analysis of graphical models for the social sciences with easybgm: A user-friendly r-package. AIP Advances, e66366. https://doi.org/10.56296/aip00010

Marsman, M., Bergh, D. van den, & Haslbeck, J. M. B. (2025). Bayesian analysis of the ordinal Markov random field. Psychometrika. https://doi.org/10.1017/psy.2024.4

Marsman, M., Huth, K., Sekulovski, N., & Bergh, D. van den. (2023). Bgms: Bayesian variable selection for networks of binary and/or ordinal variables. https://CRAN.R-project.org/package=bgms

A Tutorial on Clustering Analysis for Bayesian Graphical Models using the R packages bgms and easybgm

Introduction

`bgms`

when `save = TRUE`

`easybgm`

References

Nikola Sekulovski

PhD Candidate, Psychological Methods, UvA

A Tutorial on Clustering Analysis for Bayesian Graphical Models using the R packages bgms and easybgm

Introduction

bgms

when save = TRUE

easybgm

References

Nikola Sekulovski

PhD Candidate, Psychological Methods, UvA

`bgms`

when `save = TRUE`

`easybgm`