Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies in disjoint_union after addressing issue #761 #1587

Closed
bockthom opened this issue Nov 20, 2024 · 10 comments · Fixed by #1641
Closed

Inconsistencies in disjoint_union after addressing issue #761 #1587

bockthom opened this issue Nov 20, 2024 · 10 comments · Fixed by #1641

Comments

@bockthom
Copy link

bockthom commented Nov 20, 2024

What happens, and what did you expect instead?

In issue #761, I previously had reported unintended type conversions in disjoint_union.
These have been fixed in PR #1375 by using vctrs::vec_c() instead of c().
Finally, the corresponding fixes have been released with version 2.1.1 of rigraph.

Since the release of version 2.1.1, we tried to adjust our usage of disjoint_union to the new behavior (which enforced us to make more changes to our code than expected, as now, after simplification with edge attribute concatenation, all values of these attributes need to be lists instead of characters, for instance). While we managed to adjust our code to the new behavior of disjoint_union regarding lists and other types of attributes, we spotted two inconsistencies in the fixes carried out in PR #1375:


(1) Missing attributes are now handled inconsistently:

According to igraph's documentation, missing attributes shall be replaced by NA values.
Quoting igraph docs on disjoint_union:

For graphs that lack some vertex/edge attribute, the corresponding values in the new graph are set to NA.

However, this is not always the case in the new implementation, as the following minimum working example shows:

# Create a graph with duplicate edges and 'attr.one' edge attribute
graph.one = igraph::make_empty_graph() +
     igraph::vertices("A", "B") +
     igraph::edges(c("A", "B", "A", "B"), attr.one = "test")

# Simplify both edges into one. Use the "concat" strategy for the 'attr.one' edge-attribute.
graph.one = igraph::simplify(graph.one, edge.attr.comb = list(attr.one = "concat"))

# Create a second graph without the 'attr.one' edge attribute, but with 'attr.two' edge attribute
graph.two = igraph::make_empty_graph() +
     igraph::vertices("C", "D") +
     igraph::edges(c("C", "D"), attr.two = "test")

# Join both graphs
union = igraph::disjoint_union(graph.one, graph.two)

After running this code you can observe how the non-existence of attr.one in edges of graph.two is expressed as a NULL instead of NA, while the non-existence of attr.two in edges of graph.one is expressed as an NA:

> str(igraph::as_data_frame(union,"edges"))
'data.frame':   2 obs. of  4 variables:
 $ from    : chr  "A" "C"
 $ to      : chr  "B" "D"
 $ attr.one:List of 2
  ..$ : chr  "test" "test"
  ..$ : NULL
 $ attr.two: chr  NA "test"

So, there is an inconsistency: Missing values of the attribute that is not a list is replaced by NA (consistent with previous implementations and with the documentation), whereas missing values for the attribute that is a list are replaced by NULL. As we (and potentially other users) rely on NA as a consistently used indicator for non-existing attribute values, we consider this a bug resulting from the fixes made in PR #1375.


(2) Inconsistency between edge and vertex attributes:

While PR #1375 only addressed edge attributes, vertex attributes potentially suffer from similar problems as those reported in #761. That is, the changes of PR #1375 have been carried out in lines 252-258 of R/operators.R regarding edge attributes, while there is very similar code for vertex attributes in the same file a few lines above (lines 227-233), which has not been edited in PR #1375.


Could you please take care of the two issues described above? Thanks in advance!

CC: @hechtlC @maxloeffler

@bockthom
Copy link
Author

Sorry if I'm a little impatient. These suspected inconsistencies in the most recent update (version 2.1.1 of rigraph) are currently impacting our project. Is there an ETA on when you are able to have a look at it? Thanks in advance.

@maxloeffler
Copy link

I also ran into a similar, potentially related, problem with add_edges. When adding edges to a network it does make a difference whether edge attributes are lists or not. When they are lists, empty positions are filled by NULLs and if they aren't, then they are filled with NAs. Here is a minimum working example matching the style of the above:

# Create a graph with duplicate edges and 'attr.one' edge attribute
graph.one = igraph::make_empty_graph() +
     igraph::vertices("A", "B") +
     igraph::edges(c("A", "B", "A", "B"), attr.one = "test")

# Simplify both edges into one. Use the "concat" strategy for the 'attr.one' edge-attribute.
graph.one = igraph::simplify(graph.one, edge.attr.comb = list(attr.one = "concat"))

# Create an identical graph to 'graph.one' but with 'attr.two' edge attribute
graph.two = igraph::make_empty_graph() +
     igraph::vertices("A", "B") +
     igraph::edges(c("A", "B", "A", "B"), attr.two = "test")

# Do not simplify graph.two so that 'attr.two' stays singular

# Add edge of 'graph.one' to 'graph.two'
edge.attributes = igraph::edge_attr(graph.one)
union = igraph::add_edges(graph.two, c("A", "B"), attr = edge.attributes)

After running this we observe the following:

> str(igraph::as_data_frame(union, "edges"))
'data.frame':   3 obs. of  4 variables:
 $ from    : chr  "A" "A" "A"
 $ to      : chr  "B" "B" "B"
 $ attr.two: chr  "test" "test" NA
 $ attr.one:List of 3
  ..$ : NULL
  ..$ : NULL
  ..$ : chr  "test" "test"

Could you please also address this problem when dealing with the above? Thank you!

@krlmlr krlmlr added this to the 2.1.3 milestone Dec 16, 2024
@krlmlr
Copy link
Contributor

krlmlr commented Dec 20, 2024

Thanks. The first issue reported by Thomas, and the related problem reported by Max, is expected: NULL is the canonical missing value for lists.

vctrs::vec_detect_missing(list(1:2, NULL, 3))
#> [1] FALSE  TRUE FALSE

Created on 2024-12-20 with reprex v2.1.1

Regarding the second issue, I believe this is a reproducible example. Can you please confirm?

options(conflicts.policy = list(warn = FALSE))
library(igraph)

# create two graphs
g1 <- make_graph(~ B - -C, C - -D)
g2 <- make_graph(~ A - -G, E - -F)

# add an edge attribute of class POSIXct to each graph
g1 <- set_vertex_attr(g1, "date", value = as.POSIXct(c("2021-01-01 01:01:01", "2022-02-02 02:02:02", "2023-03-03 03:03:03")))
g2 <- set_vertex_attr(g2, "date", value = as.POSIXct(c("2021-03-03 03:03:03", "2022-04-04 04:04:04", "2023-05-05 05:05:05", "2024-06-06 06:06:06")))

# create the union of the two graphs
u <- disjoint_union(g1, g2)

# now print the structure of g1, g2, and u
vertex_attr(u, "date")
#> [1] 1609459261 1643763722 1677808983 1614736983 1649037844 1683255905 1717646766

Created on 2024-12-20 with reprex v2.1.1

@krlmlr krlmlr removed this from the 2.1.3 milestone Dec 20, 2024
@krlmlr
Copy link
Contributor

krlmlr commented Dec 20, 2024

Removing from the milestone, not a regression.

@bockthom
Copy link
Author

bockthom commented Dec 21, 2024

Thanks. The first issue reported by Thomas, and the related problem reported by Max, is expected: NULL is the canonical missing value for lists.

Yes, it is true, that NULL is the canonical missing value for lists. However, this is not expected in the first issue reported by me, which is not about lists but about edge-attribute values.

First of all, the documentation of disjoint_union explicitly states that missing values are set to NA. In the new implementation, this does not hold for all missing values any more, as some are replaced by NULL.

In addition, the new implementation breaks with the previous implementation that existed for many years. Let's take the code example from my initial post and run it once with igraph 1.6.0 and once with igraph 2.1.2:

Result with igraph 1.6.0 (and earlier versions of igraph):

> str(igraph::as_data_frame(union,"edges"))
'data.frame':	2 obs. of  4 variables:
 $ from    : chr  "A" "C"
 $ to      : chr  "B" "D"
 $ attr.one:List of 2
  ..$ : chr  "test" "test"
  ..$ : logi NA
 $ attr.two: chr  NA "test"

Result with igraph 2.1.2:

> str(igraph::as_data_frame(union,"edges"))
'data.frame':   2 obs. of  4 variables:
 $ from    : chr  "A" "C"
 $ to      : chr  "B" "D"
 $ attr.one:List of 2
  ..$ : chr  "test" "test"
  ..$ : NULL
 $ attr.two: chr  NA "test"

You can clearly see that earlier igraph versions consistently set missing values for edge/vertex attributes to NA, even within lists. This was consistent with the documentation of disjoint_union and consistent among the different possibilities how missing values can be introduced (also when simplifying graphs). The new implementation since igraph 2.1.1 violates the documentation and handles missing values differently for simplified and non-simplified graphs, which makes it more complicated to handle missing values when iterating over the edge-attribute values, as depending on whether simplifications were involved or not the attribute values can differ. So, I assume that the result of the previous implementation that always set missing values to NA (even in lists) matches users' expectations more than the current implementation that introduces NULL when missing attribute values are converted to lists and NA otherwise.

@bockthom

This comment was marked as off-topic.

@krlmlr

This comment was marked as off-topic.

@krlmlr
Copy link
Contributor

krlmlr commented Dec 30, 2024

You're right, the documentation is inconsistent with the behavior. Fixing the documentation because I'm convinced that this is the more robust way, even in the short term. The following code sheds some light on what's happening, and offers a workaround that you can apply if needed.

# Base R behavior
base_missing <- list(1:2, NA_integer_, 3L, c(NA_integer_, NA_integer_))
anyNA(base_missing)
#> [1] TRUE
is.na(base_missing[2])
#> [1] TRUE
is.na(base_missing[4])
#> [1] FALSE

# This feels very odd
is.na(NA_integer_)
#> [1] TRUE
is.na(list(NA_integer_))
#> [1] TRUE
is.na(list(NA_integer_, NA_integer_))
#> [1] TRUE TRUE
is.na(list(list(NA_integer_)))
#> [1] FALSE
is.na(list(NULL))
#> [1] FALSE

# vctrs behavior

vctrs_missing <- list(1:2, NULL, 3L, c(NA_integer_, NA_integer_))

# Can no longer be used
anyNA(vctrs_missing)
#> [1] FALSE
is.na(vctrs_missing[2])
#> [1] FALSE

# Alternative
vctrs::vec_detect_missing(vctrs_missing)
#> [1] FALSE  TRUE FALSE FALSE

# Consistent behavior
vctrs::vec_detect_missing(list(NA_integer_))
#> [1] FALSE

# Workaround if downstream code relies on base R behavior
vctrs_missing[vctrs::vec_detect_missing(vctrs_missing)] <- list(NA_integer_)
identical(vctrs_missing, base_missing)
#> [1] TRUE

Created on 2024-12-30 with reprex v2.1.1

@bockthom
Copy link
Author

I still don't understand why you favor vctrs behavior over base R behavior, as I would assume that most users are more familiar with R base behavior than with vctrs behavior. Relying on vctrs behavior now makes downstream code potentially buggy as users that are not aware of this issue discussion here might not know that a breaking change had been introduced and that they need a workaround now that replaces NULL with NA, which makes things more complicated than necessary. Nevertheless, thanks for your explanations and for demonstrating the behaviors of vctrs and base R. I understand why vctrs has the described behavior, but it still feels inconsistent when abstracting from the underlying data structure for storing edge attributes and just interpreting the edge attribute values.

And regarding your fix of the documentation:
The documentation changes regarding disjoint_union in PR #1641 do not fully cover all the inconsistencies pointed out in this issue. As pointed out by @maxloeffler, add_edges suffers from a similar problem - also there, the documentation only mentions NA values, but not NULL values. So, I guess the documentation of add_edges would need a similar treatment as done in PR #1641. Therefore, I'd suggest to reopen this issue until all the mentioned inconsistencies are eventually fixed.

@krlmlr
Copy link
Contributor

krlmlr commented Dec 31, 2024

Thanks. In edge cases like this, it's difficult to find a consensus that satisfies all objectives at the same time. Relying on vctrs lifts a heavy load from our shoulders. For future discussions, it would help me to see what you want to achieve in the end -- as of now, I only have a vague idea of the motivating problem.

@maelle: can you please review the add_edges() documentation and open a new issue or PR as necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants