-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make pseudodata_table
correctly deal with multiple replicas
#2034
Conversation
@@ -343,7 +343,7 @@ def replica_nnseed_fitting_data_dict(replica, exps_fitting_data_dict, replica_nn | |||
|
|||
replicas_nnseed_fitting_data_dict = collect("replica_nnseed_fitting_data_dict", ("replicas",)) | |||
groups_replicas_indexed_make_replica = collect( | |||
"indexed_make_replica", ("group_dataset_inputs_by_experiment", "replicas") | |||
"indexed_make_replica", ("replicas", "group_dataset_inputs_by_experiment") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the change of order necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thin it makes what we do in pseudodata_table
more readable. There we group the entries to groups_replicas_indexed_make_replica
corresponding to a given replica. I think that's easier to understand the way it's done now than if we had to take e.g. index 0 and then skip a number of indexes equal to the number of groups to get the second input corresponding to the same replica
df = [ | ||
pd.concat(groups_replicas_indexed_make_replica[i : i + groups_per_replica]) | ||
for i in range(0, len(groups_replicas_indexed_make_replica), groups_per_replica) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you achieve this with a reshape or permutation of groups_replicas_indexed_make_replica
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because groups_replicas_indexed_make_replica
is a list of indexed_make_replica
containing a number of dataframes equal to number_of_replicas x number_of_datagroups. It's not a really clean input.
Here I group the list items (all different data groups) that correspond to the same replica into a single dataframe for each replica
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just complicating your life here, but would it be possible to do something along the lines of
np.array(groups_replicas_indexed_make_replica).reshape(replicas, groups_per_replica)
?
(or the other way around)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because it's a list of dataframes and I do need to retain the information on the labels
Co-authored-by: Juan M. Cruz-Martinez <[email protected]>
Co-authored-by: Juan M. Cruz-Martinez <[email protected]>
44f0068
to
ddf7702
Compare
For the thcovmat alphas stuff it makes a big difference whether I use the central data or the average over data replicas. While looking into it I want to use this function with multiple replicas.