Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Tensor SetItem for Growing Tensor #92

Closed
ntjohnson1 opened this issue Apr 18, 2023 · 11 comments
Closed

Fix Tensor SetItem for Growing Tensor #92

ntjohnson1 opened this issue Apr 18, 2023 · 11 comments
Assignees
Labels
bug Something isn't working doing Actively being worked on
Milestone

Comments

@ntjohnson1
Copy link
Collaborator

ntjohnson1 commented Apr 18, 2023

If a tensor of a fixed size is created when using set item it should grow to accommodate provided size. When implementing tendiag I found a few edge cases where this doesn't work properly. Will add examples later.

For a single index:

>>> import pyttb as ttb
>>> import numpy as np
>>> X = ttb.tensor.from_data(np.ones((1,1)))
>>> X[1,1,1] = 1
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<path_to_clone>\pyttb\pyttb\tensor.py", line 1322, in __setitem__
    if (newsiz != self.shape).any():
AttributeError: 'bool' object has no attribute 'any'

For a series of indices:

>>> X = ttb.tensor.from_data(np.ones((2,1)))
>>> X[np.array([[0, 1],[0, 1]])] = 1
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<path_to_clone>\pyttb\pyttb\tensor.py", line 1368, in __setitem__
    newData[idx] = self.data
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
>>> X.data[np.array([[0, 1],[0, 1]])] # works
@ntjohnson1 ntjohnson1 added the bug Something isn't working label Apr 18, 2023
@ntjohnson1 ntjohnson1 added this to the v2.0 milestone Apr 18, 2023
@ntjohnson1 ntjohnson1 self-assigned this Apr 18, 2023
@ntjohnson1 ntjohnson1 added the doing Actively being worked on label Apr 27, 2023
@ntjohnson1
Copy link
Collaborator Author

As I started to explore this there was definitely scope creep. The tensor indexing code was some of the more complicated logic when doing the translation. I might use this ticket as an opportunity to better describe the different options and look over our tests. Depending on how big an effort that looks like may break into another ticket/open a discussion about indexing clarity.

In case this pops up as a bug that is blocking something and needs a more short term fix here is the branch that resolves the two issues above and a few more https://github.com/ntjohnson1/pyttb/tree/tensor_indexing (each commit should have a test that shows an example of something that the change now supports)

@dmdunla
Copy link
Collaborator

dmdunla commented May 1, 2023

This is quite a refactor! Since pyttb.tensor data is a np.ndarray, should we consider just directly leveraging np.array getitem/setitem methods? The current method is a refactor of the Tensor Toolbox for MATLAB tensor class. Would this simplify these methods?

@ntjohnson1
Copy link
Collaborator Author

My changes above aren't too large at the moment. The break into methods commits make no change but break the giant single function into a few helpers. That might simplify things, but we'd want sptensor to be consistent. Additionally, I don't think numpy supports dynamic growth. I put the note above to provide the update that my plan was to write up some more thoughts on this before exploring further.

@dmdunla
Copy link
Collaborator

dmdunla commented May 1, 2023

OK, looking at numpy.ndarray, I see there are helper methods, like expand_dims, append, insert, and concatenate that allow for extending arrays in different ways. But the Tensor Toolbox for MATLAB allows for updating the size, shape, and values of a tensor using slice notation. It's very convenient for working with tensors that need to be adjusted:

% Tensor Toolbox for MATLAB
>> X = tensor(ones([1 1]))
X is a tensor of size 1 x 1
	X(:,:) = 
	     1
>> X(1,1,1) = 1
X is a tensor of size 1 x 1 x 1
	X(:,:,1) = 
	     1
>> X(1,1,2) = 2
X is a tensor of size 1 x 1 x 2
	X(:,:,1) = 
	     1
	X(:,:,2) = 
	     2
>> X(1,2,2) = 2
X is a tensor of size 1 x 2 x 2
	X(:,:,1) = 
	     1     0
	X(:,:,2) = 
	     2     2
>> X([1,2],[1,2]) = 2
Error using tensor/subsasgn (line 126)
Invalid use of tensor/subsasgn
 
>> X([1:2],[1:2]) = 2
Error using tensor/subsasgn (line 126)
Invalid use of tensor/subsasgn
 
>> X([1:2],[1:2],[1:2]) = 2
X is a tensor of size 2 x 2 x 2
	X(:,:,1) = 
	     2     2
	     2     2
	X(:,:,2) = 
	     2     2
	     2     2
>> X([1,2],[1,2],[1,2]) = 3
X is a tensor of size 2 x 2 x 2
	X(:,:,1) = 
	     3     3
	     3     3
	X(:,:,2) = 
	     3     3
	     3     3
>> X = tensor(ones([1 1]))
X is a tensor of size 1 x 1
	X(:,:) = 
	     1
>> X([1:2],[1:2],[1:2]) = 2
X is a tensor of size 2 x 2 x 2
	X(:,:,1) = 
	     2     2
	     2     2
	X(:,:,2) = 
	     2     2
	     2     2

@ntjohnson1
Copy link
Collaborator Author

It's very convenient for working with tensors that need to be adjusted

Agreed it looks like this dynamic nature is useful or at least consistent with MATLAB tensortoolbox today. We mostly cover that behavior, many of my fixes in the linked branch are edge cases with changes less than ~10 lines. I think a full refactor probably isn't necessary but thinking through the interpretation of the setitem/getitem and jotting down some notes is useful (as I mentioned above we can probably take my PR above then break out more follow on).

Ultimately, I just realized the expected behavior was a little unclear to me, which means its probably additionally unclear to a new user to pyttb. Since all our algorithms implemented to date appear to be numerically correct the indexing is used consistently for common paths, but want to try and clean it up a bit for further extension.

Here is one example where I would consider the lack of clarity effectively a bug (I didn't check the MATLAB side yet):

empty_sptensor = ttb.sptensor()
empty_sptensor[np.array([[0, 1], [3, 3]])] = 5
assert np.all(empty_sptensor[0, 3] != 5)

empty_tensor = ttb.tensor()
empty_tensor[np.array([[0, 1], [3, 3]])] = 5
assert np.all(empty_tensor[0, 3] == 5)

Basically for sptensor we are treating the numpy array as a series of subs where for tensor we are treating it as [row_1, row_2], [column_1, column_2]

@dmdunla
Copy link
Collaborator

dmdunla commented May 2, 2023

Here is the MATLAB version:

>> empty_tensor = sptensor()
empty_tensor is an all-zero sparse tensor of size [empty tensor]
>> empty_tensor([1,2],[4,4]) = 5
empty_tensor is a sparse tensor of size 2 x 4 with 4 nonzeros
	(1,4)     5
	(1,4)     5
	(2,4)     5
	(2,4)     5
>> assert(empty_tensor([1,4]) ~= 5)
Assertion failed.
 
>> empty_tensor = tensor()
empty_tensor is a tensor of size [empty tensor]
	empty_tensor = []
>> empty_tensor([1,2],[4,4]) = 5
empty_tensor is a tensor of size 2 x 4
	empty_tensor(:,:) = 
	     0     0     0     5
	     0     0     0     5
>> assert(empty_tensor([1,4]) == 5)

@ntjohnson1
Copy link
Collaborator Author

Here are the different kinds of MATLAB behavior for Tensor as far as I can tell (not including expanding dimensions etc):

Tensor (MATLAB):
    Linear:
    >> b = tenones(3,3);
    >> b(1);
    >> b(5);
    Subscripts:
    >> b = tenones(3,3);
    >> b([1,1;2,2]) = 3;
    >> assert(b(1,1) == 3);
    >> assert(b(2,2) == 3);
    Subtensor:
    >> b = tenones(3,3);
    >> b([1,1], [2,2]) = 3;
    >> assert(b(1,1) ~= 3);
    >> assert(b(2,2) ~= 3);
    >> assert(b(1,2) == 3);
    ```

@ntjohnson1
Copy link
Collaborator Author

I fixed Linear for python in the branch (before it required a numpy array with a single value). Working my way through the different flavors to make sure they align with MATLAB:

>>> import pyttb as ttb
>>> b = ttb.tenones((3,3))
>>> b[0] = 5
>>> assert b[0] == 5
>>> b[0:1] = 6
>>> assert b[0:1] == 6

@ntjohnson1
Copy link
Collaborator Author

This discussion has grown long. I am going to break this into two issues. The first for the bug identified initially that dimensions weren't growing properly using set item. Then the newer issue I saw while investigating that related to making our indexing more intuitive for python. I'll probably open the PR for the first later this week and pause digging into the second.

@dmdunla
Copy link
Collaborator

dmdunla commented May 9, 2023

Basically for sptensor we are treating the numpy array as a series of subs where for tensor we are treating it as [row_1, row_2], [column_1, column_2]

From the MATLAB code, tensor.__setitem__ and sptensor.__setitem__ should behave the same (per the corresponding MATLAB tensor/subsasgn and sptensor/subsasgn.

For updating p individual elements of an n-dimensional tensor using n-tuple subscripts, you should pass a p x n array of indices into __setitem__ and the right-hand side should be a p x 1 array.

For updating p individual elements of an n-dimensional tensor using linear indices, you should pass a p x 1 array of indices into __setitem__ and the right-hand side should be a p x 1 array.

For updating a subtensor for an n-dimensional tensor, you should pass n arrays of size either 1 x 2 (for a range in that dimension) or of size 1 x 1 (for a single element in that dimension) and the right-hand side should be the appropriate size rectangular tensor. Note that "appropriate" is very MATLAB specific in this, as single dimensions in the first or last dimension of a multidimensional array are squeezed out, but for tensor classes in the Tensor Toolbox, they are not squeezed out unless the squeeze method is called explicitly.

@ntjohnson1
Copy link
Collaborator Author

The original scope of this is closed with #109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working doing Actively being worked on
Projects
None yet
Development

No branches or pull requests

2 participants