Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add shards_cluster for PG support in OTP23 #49

Closed
wants to merge 5 commits into from

Conversation

alisinabh
Copy link

@alisinabh alisinabh commented Sep 20, 2020

#47

  • Change all pg2 usages to shards_cluster in other modules and tests.
  • Implement pg2 proxy functions in shards_cluster.
  • Implement local pg operations in shards_cluster.
  • Implement remote pg operations to handle remote join/leave.
  • Handle join/2 and leave/2 on list of pids.
  • Implement delete/1 for pg using leave/2 on all pids.
  • Wait for completion of remote join and leave as it makes the output on shards.leave/join invalid. (Because of concurrency) Not possible due to population problem in pg implementation.
  • Write tests for shards_cluster.
  • Make sure pg is started before shards.

join(Group, Pid) ->
%% HACK: Maybe implement apply_on_target?
OwnerNode = node(Pid),
if
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use case instead

@alisinabh
Copy link
Author

Hi @cabol,

So everything is implemented except tests but there are some issues I need your help around.

The biggest issue is that when we call leave or join on pg, The members list won't be updated immediately. Because pg is optimized for speed and does not guarantee consistency! So one test was failing because list of nodes was not updated right after the leave. Failing test was: shards_dist_SUITE.erl:146. I saw you had a sleep in there too and assume you anticipated this issue well before pg was event a thing :))

Another issue that I'm having is pg does not start automatically and in it's documentation it says we should set kernel parameters for that. Can you give me some hints on how to do that in this project?

P.S. I think the CI system may be malfunctioning!? I'm not quite sure!

Thanks.

@@ -143,7 +143,10 @@ t_join_leave_ops(Config) ->

% leave node E from SET
OkNodes2 = lists:usort([node() | lists:droplast(OkNodes1)]),
OkNodes2 = shards:leave(?SET, [ENode]),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line sometime fails due to pg propagation issue. I've just ignored the result for now!

@cabol
Copy link
Owner

cabol commented Oct 2, 2020

Hi @alisinabh , first of all, thanks a lot for the effort 😄 !!

The biggest issue is that when we call leave or join on pg, The members list won't be updated immediately. Because pg is optimized for speed and does not guarantee consistency! So one test was failing because list of nodes was not updated right after the leave. Failing test was: shards_dist_SUITE.erl:146. I saw you had a sleep in there too and assume you anticipated this issue well before pg was event a thing :))

Yeah agreed, the join and leave functions have to be very different, considering all that you mention. And about the test, I had to put the sleep because yes, I was having a sync issue because the group was not updated immediately across the cluster after leave, but not anticipating this 😂 .

Another issue that I'm having is pg does not start automatically and in it's documentation it says we should set kernel parameters for that. Can you give me some hints on how to do that in this project?

Overall, I was checking out the issues you mention and also the pg docs and I think this is going to be a bit more complicated than I expected, because there are significant differences in the implementation, so there are several parts we have to refactor. I'd like to propose something, due to this, one of my ideas is separate shards_dist in a separate repository and continue from there. It will be easier because we'll have more flexibility to do changes without affecting shards. Once we have shards_dist in its own repository I think we can either continue the current approach or change it completely if that is the best, the point is, we will have more flexibility in the implementation. What do you think? I'd really like to know your thoughts.

PD: I could work this weekend on this refactor, moving shards_dist to another repo and let shards completely isolated, only for local sharding.

@alisinabh
Copy link
Author

Thank you @cabol,

pg is more complicated than it seemed initially. I completely agree with you on separating the repositories since a quick fix at this stage is not possible without messing with shards project configuration and that is not necessary.

I still will be happy to help on this. Please let me know if there was anything that I could do.

Thanks again.

@alisinabh alisinabh closed this Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants