-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make pseudodata_table
correctly deal with multiple replicas
#2034
Changes from all commits
5432e68
4040eb3
1b46e3b
2722aa7
ddf7702
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ | |
Providers which prepare the data ready for | ||
:py:func:`n3fit.performfit.performfit`. | ||
""" | ||
|
||
from collections import defaultdict | ||
import functools | ||
import hashlib | ||
|
@@ -70,8 +71,7 @@ def __init__(self, group_name, seed, masks=None): | |
super().__init__(group_name, seed) | ||
|
||
def __iter__(self): | ||
for m in self.masks: | ||
yield m | ||
yield from self.masks | ||
|
||
|
||
def tr_masks(data, replica_trvlseed, parallel_models=False, replica=1, replicas=(1,)): | ||
|
@@ -343,7 +343,7 @@ def replica_nnseed_fitting_data_dict(replica, exps_fitting_data_dict, replica_nn | |
|
||
replicas_nnseed_fitting_data_dict = collect("replica_nnseed_fitting_data_dict", ("replicas",)) | ||
groups_replicas_indexed_make_replica = collect( | ||
"indexed_make_replica", ("group_dataset_inputs_by_experiment", "replicas") | ||
"indexed_make_replica", ("replicas", "group_dataset_inputs_by_experiment") | ||
) | ||
|
||
|
||
|
@@ -359,10 +359,24 @@ def pseudodata_table(groups_replicas_indexed_make_replica, replicas): | |
`fitting::savepseudodata` is `true` (as per the default setting) and | ||
replicas are fitted one at a time. The table can be found in the replica | ||
folder i.e. <fit dir>/nnfit/replica_*/ | ||
|
||
""" | ||
# Concatenate over replicas | ||
df = pd.concat(groups_replicas_indexed_make_replica) | ||
# groups_replicas_indexed_make_replica is collected over both replicas and dataset_input groups, | ||
# in that order. What this means is that groups_replicas_indexed_make_replica is a list of size | ||
# number_of_replicas x number_of_data_groups. Where the ordering inside the list is as follows: | ||
# [data1_rep1, data2_rep1, ..., datan_rep1, ..., data1_repn, data2_repn, ..., datan_repn]. | ||
|
||
# To correctly put this into a single dataframe, we first need to know the number of | ||
# dataset_input groups there are for each replica | ||
groups_per_replica = len(groups_replicas_indexed_make_replica) // len(replicas) | ||
# then we make a list of pandas dataframes, each containing the pseudodata of all datasets | ||
# generated for a single replica | ||
df = [ | ||
pd.concat(groups_replicas_indexed_make_replica[i : i + groups_per_replica]) | ||
for i in range(0, len(groups_replicas_indexed_make_replica), groups_per_replica) | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why can't you achieve this with a reshape or permutation of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because Here I group the list items (all different data groups) that correspond to the same replica into a single dataframe for each replica There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just complicating your life here, but would it be possible to do something along the lines of
(or the other way around) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No because it's a list of dataframes and I do need to retain the information on the labels |
||
# then we concatentate the pseudodata of all replicas into a single dataframe | ||
df = pd.concat(df, axis=1) | ||
# and finally we add as column titles the replica name | ||
df.columns = [f"replica {rep}" for rep in replicas] | ||
return df | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the change of order necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thin it makes what we do in
pseudodata_table
more readable. There we group the entries togroups_replicas_indexed_make_replica
corresponding to a given replica. I think that's easier to understand the way it's done now than if we had to take e.g. index 0 and then skip a number of indexes equal to the number of groups to get the second input corresponding to the same replica