Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remake X: DiscreteDistribution should have pmv not pmf #1051

Closed
sbenthall opened this issue Aug 3, 2021 · 7 comments
Closed

Remake X: DiscreteDistribution should have pmv not pmf #1051

sbenthall opened this issue Aug 3, 2021 · 7 comments

Comments

@sbenthall
Copy link
Contributor

sbenthall commented Aug 3, 2021

Clarifying the scope of this ticket: It involves two changes to the DiscreteDistribution class:

  • replacing 'pmf' with 'pmv'
  • replacing 'X' with 'values'

https://github.com/econ-ark/HARK/blob/master/HARK/distribution.py#L816-L829

The class needs to be refactored with this change. Then, downstream code using DiscreteDistribution needs to be updated to reflect this change.


# CDC 20210621: We should not use pmf for the point masses of the realizations, because it
# is not a probability mass 'function.' Scipy.discrete_rv, sympy, and Mathematica all
# return pmf (or pdf) as a function. They have different syntaxes for retrieving the
# vector of point masses; here it is called the pmv (probability mass vector)


class DiscreteDistribution(DiscreteDistributionOld):
    __doc__ = DiscreteDistributionOld.__doc__
    __doc__ += """
        XYZ : np.array
            Restructure the dimensions of the np.array so that successive
            columns in XYZ correspond to the vectors of values of each
            discrete random variables across the broadcasted combinations.
            Thus, [instance].XYZ[0] will return the vector of discrete
            realizations of the first variable, [instance].XYZ[-1] will
            get the last RV, etc.
        pmv : The vector of point masses of the broadcasted collection
            of random variables
"""

    def __init__(self, pmf, X, seed=0):  # Get the old definition
        super().__init__(pmf, X, seed)  # execute the old def
        XYZ = np.column_stack(X)  # stack actually unstacks them
        if self.dim() == 1:  # numpy 1 dimensional arrays want to be
            XYZ = XYZ.T  # the wrong shape
        self.XYZ = XYZ  # [ N dimensional matrix ]
        # The pmf should be a function, not a vector (as now)
        # TODO: Replace invocations of pmf with pmv
        self.pmv = pmf

 
@sbenthall
Copy link
Contributor Author

Also, "X" should not be "X", but a more descriptive and lowercase term, like "values"

@sbenthall
Copy link
Contributor Author

See discussion here:
https://github.com/econ-ark/HARK/pull/1146/files#r883016319

Because of the ambiguity about the semantics of "X" -- a dimension of X can mean either a dimension of the state values of of random variable, or the 'nature's choice' dimension -- it would be better if X/values were a different kind of data structure that makes these differences more explicit.

For example, if X were a Pandas dataframe, the index could be nature's choice, and the columns would be dimensions of the RV value. However, that might not work for matrix-valued (as opposed to vector-valued) RVs.

@sbenthall
Copy link
Contributor Author

It might be possible to take inspiration, or an implementation, from another distribution library.

https://github.com/econ-ark/HARK/wiki/Distribution-Libraries

@sbenthall sbenthall changed the title DiscreteDistribution should have pmv not pmf Remake X: DiscreteDistribution should have pmv not pmf Jun 1, 2022
@alanlujan91 alanlujan91 self-assigned this Aug 3, 2022
@alanlujan91
Copy link
Member

xarray.DataArray has attributes values and data by default. The DiscreteDistributionXRA could easily make this change without significant work.

@llorracc how about data and prob?

@alanlujan91 alanlujan91 mentioned this issue Aug 3, 2022
3 tasks
@llorracc
Copy link
Collaborator

llorracc commented Aug 3, 2022 via email

@sbenthall
Copy link
Contributor Author

scipy.stats.rv_discrete uses values() to access both the atoms and the probabilities:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_discrete.html#scipy.stats.rv_discrete

@alanlujan91
Copy link
Member

At the last meeting I think we had consensus on .pmv and .atoms. I'll work on that.

@sbenthall sbenthall modified the milestones: 1.0.0, 0.13.0 Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants