Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make pseudodata_table correctly deal with multiple replicas #2034

Merged
merged 5 commits into from
Apr 11, 2024

Conversation

RoyStegeman
Copy link
Member

For the thcovmat alphas stuff it makes a big difference whether I use the central data or the average over data replicas. While looking into it I want to use this function with multiple replicas.

@@ -343,7 +343,7 @@ def replica_nnseed_fitting_data_dict(replica, exps_fitting_data_dict, replica_nn

replicas_nnseed_fitting_data_dict = collect("replica_nnseed_fitting_data_dict", ("replicas",))
groups_replicas_indexed_make_replica = collect(
"indexed_make_replica", ("group_dataset_inputs_by_experiment", "replicas")
"indexed_make_replica", ("replicas", "group_dataset_inputs_by_experiment")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the change of order necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thin it makes what we do in pseudodata_table more readable. There we group the entries to groups_replicas_indexed_make_replica corresponding to a given replica. I think that's easier to understand the way it's done now than if we had to take e.g. index 0 and then skip a number of indexes equal to the number of groups to get the second input corresponding to the same replica

validphys2/src/validphys/n3fit_data.py Outdated Show resolved Hide resolved
validphys2/src/validphys/n3fit_data.py Outdated Show resolved Hide resolved
df = [
pd.concat(groups_replicas_indexed_make_replica[i : i + groups_per_replica])
for i in range(0, len(groups_replicas_indexed_make_replica), groups_per_replica)
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you achieve this with a reshape or permutation of groups_replicas_indexed_make_replica

Copy link
Member Author

@RoyStegeman RoyStegeman Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because groups_replicas_indexed_make_replica is a list of indexed_make_replica containing a number of dataframes equal to number_of_replicas x number_of_datagroups. It's not a really clean input.

Here I group the list items (all different data groups) that correspond to the same replica into a single dataframe for each replica

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just complicating your life here, but would it be possible to do something along the lines of

np.array(groups_replicas_indexed_make_replica).reshape(replicas, groups_per_replica) ?

(or the other way around)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because it's a list of dataframes and I do need to retain the information on the labels

@RoyStegeman RoyStegeman merged commit 0207a00 into master Apr 11, 2024
6 checks passed
@RoyStegeman RoyStegeman deleted the rs-quickfix branch April 11, 2024 14:16
@scarlehoff scarlehoff mentioned this pull request May 20, 2024
33 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants