You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bottleneck is in the generation of pivot locations which uses a Beta(1.6, 4.3) distribution to generate real values in range [0, 1), multiply by number of columns and get distinct values.
importsage.probability.probability_distributionaspdnum_col=46num_pivots=num_colimporttimestart=time.time()
pivots= [0] #Force first column to be a pivot. No harm if no pivots at all.pivot_generator=pd.RealDistribution("beta", [1.6, 4.3])
whilelen(pivots) <num_pivots:
pivot_column=int(pivot_generator.get_random_element() *num_col)
ifpivot_columnnotinpivots:
pivots.append(pivot_column)
print(time.time()-start)
which may take up to a few seconds.
I can see a few possible ways to fix it:
hard code the case of full-rank matrix (which is frequent and also the slowest one)
sample from the distribution without using a loop
change to some other probability distribution that is easier to sample from (backwards incompatible!) e.g. using the uniform distribution only takes O(n log n) steps because of the cup coupon collector problem, and there are algorithms that takes only O(n) steps
The text was updated successfully, but these errors were encountered:
I agree that it is a very bad design to generate a subset of {0, 1, ..., n-1}. I suggest to move this subset generation outside of random_rref_matrix(parent, num_pivots). More precisely
having somewhere one (or several) function(s) generating pivots (ie a random subset generator)
having a function generating a matrix from a given set of pivots
For the second item one could slightly change the signature of random_rref_matrix to random_rref_matrix(parent, pivots) where pivots is expected to be a subset of {0, 1, ..., n-1}. If it is provided as an integer we could interpret it as a number of pivots and generate a random subset according to some default distribution (first item).
I think it is perfectly fine to change the default distribution as long as it remains reasonable.
trying to do something for sagemath#35664 ; not yet good
### 📝 Checklist
- [x] The title is concise, informative, and self-explanatory.
- [x] I have linked a relevant issue or discussion.
- [x] I have created tests covering the changes.
- [x] I have updated the documentation accordingly.
URL: sagemath#36313
Reported by: Frédéric Chapoton
Reviewer(s): David Coudert, Frédéric Chapoton
Is there an existing issue for this?
Did you read the documentation and troubleshoot guide?
Environment
Steps To Reproduce
Expected Behavior
Should be near-instant.
Actual Behavior
As above.
Additional Information
The bottleneck is in the generation of pivot locations which uses a Beta(1.6, 4.3) distribution to generate real values in range [0, 1), multiply by number of columns and get distinct values.
sage/src/sage/matrix/special.py
Lines 2512 to 2519 in 6ba0eaf
Extracting it out, this snippet
which may take up to a few seconds.
I can see a few possible ways to fix it:
The text was updated successfully, but these errors were encountered: