Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: incorrect number of blocks in "membership" list for "estimateMultiplexSBM" method #9

Open
AdityaIyer01 opened this issue Mar 8, 2023 · 7 comments
Assignees

Comments

@AdityaIyer01
Copy link

The estimateMultiplexSBM method fits the model with different values of K (the number of blocks), and it selects the value of K that produces the highest ICL. However, I found that the number of blocks in the memberships and indMemberships lists were not equal to the optimal K.

Here is an example from when I ran the code. The estimateMultiplexSBM method fit the model for each K in {1, 2, 3, 4} and found that the highest ICL value was when K=4. However, the indMemberships list looked something like this:
image
Note: these are only the first 9 nodes.

As you can see, there are only 3 blocks present when there should be 4.

I assume this is because the memberships and indMemberships lists are not updated to store the memberships of the model with the highest ICL. They may be storing the memberships of the model with the second-highest ICL (which is probably the second-to-last model to be fit).

Is there a way to patch this quickly?

@AdityaIyer01 AdityaIyer01 changed the title Incorrect number of blocks in "membership" list for "estimateMultiplexSBM" method Bug: Incorrect number of blocks in "membership" list for "estimateMultiplexSBM" method Mar 8, 2023
@AdityaIyer01 AdityaIyer01 changed the title Bug: Incorrect number of blocks in "membership" list for "estimateMultiplexSBM" method Bug: incorrect number of blocks in "membership" list for "estimateMultiplexSBM" method Mar 8, 2023
@Sophiedonnet
Copy link
Member

Please, can you provide a reproducible code?
We never encountered that type of problems.
Kind regards
Sophie

@AdityaIyer01
Copy link
Author

AdityaIyer01 commented Mar 15, 2023

Here is an example of code that produces the issue. When I ran the estimateMultiplexSBM method, it found the optimal K to be equal to 5. However, the indMemberships parameter only had 4 blocks.

# Set the seed to 1234 to reproduce the results
set.seed(1234);

# Load the matrix for the first layer. As you can see, I saved it locally as a 
# binary file in the Downloads folder.
file_path <- file.path(Sys.getenv("USERPROFILE"), "Downloads", "mat1.bin");
load(file_path);

# Load the matrix for the second layer. As you can see, I saved it locally as a 
# binary file in the Downloads folder.
file_path <- file.path(Sys.getenv("USERPROFILE"), "Downloads", "mat2.bin");
load(file_path);

# Define the two layers and concatenate them to form a list
listSBM = c(defineSBM(mat1, model="gaussian", dimLabels=" "),
            defineSBM(mat2, model="gaussian", dimLabels=" "));

# Run the estimateMultiplexSBM method and store the results
Results = estimateMultiplexSBM(listSBM, dependent=FALSE);

# View the indMemberships variable
Results$indMemberships

I have attached the two matrices that are used in the code. They are both 80x80 symmetric matrices. I found that this issue only occurred when the matrices are of a large size. When the matrices are sufficiently small (e.g. 50x50), this issue no longer occurs.

Please let me know if this is still an issue on your end.

Thank you,
Aditya

Matrices.zip

@Demiperimetre
Copy link
Contributor

Thank you, Aditya, for the reproducible code.

We've investigated to see why Results$indMemberships provides you with only 4 blocks instead of five.
The inference reaches a confusing configuration where you have equal probabilities for some nodes to belong
either to block 4 or 5...
You can see that by running: Results$probMemberships.

Results$indMemberships is actually computed by running apply(Results$probMemberships[[1]],1,which.max).
Therefore, it misses the 5th block.
It seems that in your case, the assumption that the block clustering is shared by the two networks is questionable.

By the way, even if it doesn't change much, we suggest you running also
rownames(mat1)=colnames(mat1)
rownames(mat2)=colnames(mat2)
to ensure that R considers these two matrices as symmetric.

Best,

@AdityaIyer01
Copy link
Author

Thank you for the response. I just have one quick follow-up question.

If there are nodes that have an equal probability of belonging to block 4 or block 5, then how is the blockProp measure computed? Are nodes that can belong to either block 4 or block 5 randomly assigned to one of the two blocks?

Best,
Aditya Iyer

@AdityaIyer01
Copy link
Author

I also have another quick follow up.

I want to modify the code to evaluate the model for just one value of K (e.g. K = 3). How would I change the code to make this happen? I am not entirely sure how the nbBlocksRange list in the estimOptions parameter is configured.

Best,
Aditya

@Demiperimetre
Copy link
Contributor

Demiperimetre commented Mar 16, 2023 via email

@Demiperimetre
Copy link
Contributor

Demiperimetre commented Mar 16, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants