-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tensor] Refactorize Tensor Class to TensorV2 @open sesame 03/26 12:26 #2500
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2500. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
d37196f
to
b86dc2a
Compare
cibot: @djeong20, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2500-202403081422460.15307211875916-b86dc2ac092b931ae8f75ce2bb55d4d3439f18ad/. |
cibot: @djeong20, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2500-202403081634500.4846498966217-b86dc2ac092b931ae8f75ce2bb55d4d3439f18ad/. |
b86dc2a
to
65f6483
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djeong20, 💯 All CI checkers are successfully verified. Thanks.
65f6483
to
e4232d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djeong20, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you just need to resolve the conflict!👍
if (m.size() > this->size()) | ||
throw exception::not_supported("broadcasting *this is not supported"); | ||
|
||
const TensorDim m_dim = m.getDim(); | ||
|
||
BroadcastInfoV2 e; | ||
BroadcastInfo e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why BroadcastInfo's variable name is e?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no particular reason! reused the previous name.
ad6b019
to
27fe176
Compare
27fe176
to
acb6d4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djeong20, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
nntrainer/tensor/tensor.cpp
Outdated
Tdatatype dtype = tensors.front().getDim().getDataType(); | ||
|
||
if (dtype == Tdatatype::FP32) { | ||
output = FloatTensor::cat(tensors, axis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this if branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so. As far as I know, this function concatenates a vector of tensors by a given axis.
However, the problem with this function is tensors are concatenated to the first tensor of the list, not the tensor that calls this function. Therefore, the FP16 tensor can concatenate a list of FP32 tensors.
While the input tensors have no relation with the tensor that calls this function, we need a decision point on whether to concatenate a list of FP32 tensors or FP16 tensors, and this should be made in the tensor class.
I think there is plenty of room to improve the function itself and we can later newly design this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we can change the api.. like tensors[0]->itensor.cat(tensors, axis)?
Cause output data type is going to be type of tensor[0] anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot use it as tensors[0]->itensor.cat(tensors, axis)
since itensor
is a private variable. I'll modify this in another way and let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this by creating a new member function of Tensor concat()
, and the current Tensor::cat()
, which is a static function, uses concat()
. This should work exactly the same as how tensors[0]->itensor.cat(tensors, axis)
would operate.
nntrainer/tensor/tensor.cpp
Outdated
itensor->copy(from); | ||
} else { | ||
// replace with a new tensor that are the same with the given tensor | ||
if (from.getDataType() == ml::train::TensorDim::DataType::FP32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this if branch by calling itensor.copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this code intentionally in the tensor class for the following reasons.
- This condition checks the data type of the Tensor to copy. Therefore, even if this code is moved into Float/HalfTensor, we still need to check the source Tensor's data type, which eventually has the same code in both classes.
- Current implementation uses swap() to perform copy. However, this won't be a valid option when it happens in the Float/HalfTensor class. This is because
swap(t, *this)
would beswap(Tensor &lhs, FloatTensor &rhs)
orswap(Tensor &lhs, HalfTensor &rhs)
instead ofswap(Tensor &lhs, Tensor &rhs)
. While Tensor has FloatTensor as a member variable, I believe implementing a swap between Tensor and FloatTensor would require structural change.
Hope this clarifies and let me know if you have further questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if you have any suggestions, please bring them up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using template? Cause if we need to add more tensor type like int8, bf16, then there will be more branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, we can just remove this branch and create Tensor T using copy constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
nntrainer/tensor/tensor.cpp
Outdated
for (unsigned int c = 0; c < t.channel(); ++c) { | ||
for (unsigned int h = 0; h < t.height(); ++h) { | ||
for (unsigned int w = 0; w < t.width(); ++w) { | ||
if (getDataType() == ml::train::TensorDim::DataType::FP32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as copy()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is minor, how about moving the if part to outside of the loop?
And It seems little be strange.
getDataType() returns my data type.. and it apply when we get the value of from.
Should it be from.getDataType()??
And if it has basic assumption saying both tensor types are same, then do not need to consider the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is resolved now. thanks for pointing it out!
void deallocate() { | ||
data = nullptr; | ||
offset = 0; | ||
template <typename T = float> T *getData() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think eventually we should remove this getData() and getAddress() function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally. we should first remove getData()
and getAddress()
usage in layers (and other places), then deprecate both functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djeong20, 💯 All CI checkers are successfully verified. Thanks.
* @brief Copy the Tensor | ||
* @param[in] input Tensor to be copied | ||
* @param[out] output output Tensor | ||
*/ | ||
void copy_with_stride(const Tensor &input, Tensor &output) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the stride
stand for? (implicit stride is applied only when data types are heterogeneous) Is it used to copy tensor with different type only? Could you add more explanation on the comment? If this copy is used only when tensor types are different, could you think about changing the function name? I feel it sounds to support strided copy from any tensor with any stride options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy_with_stride()
is not used to copy tensors with different data types. The function assumes it only takes the same data type. copyData()
should be the function for copying different data types.
I'm not sure what's the initial purpose for this function since there's not much description in #1300, but it is currently used for copying uncontiguous tensors, which in most cases copies shared data tensors. We should discuss whether to keep the name or change it to a better one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm working on a new PR for adding descriptions and unifying/changing naming. Please let me know if other functions seem unclear or need more explanations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thank you for clarifying my misunderstanding!
This commit deprecates the existing TensorV2 class and replaces Tensor class with the new TensorV2 class. The previous Tensor class has been removed and all its usages have been updated to use the TensorV2 class. Additionally, all instances of TensorV2 usage within the NNTrainer have been removed. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>
This commit aims to fix several issues that arose due to the refactoring of the Tensor class. **Changes proposed in this PR:** - The copy constructor has been implemented to prevent incorrect behavior of the default copy constructor in this commit - Tensor add_i() has been newly implemented to fix previous incorrect implementations. - Add chain() function that returns LazyTensor **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>
This commit updates recently added features in tensor, including add_i_partial() and ele_mul(). The newly added functions have been implemented according to the revised tensor structure. **Changes proposed in this PR:** - Update Float/HalfTensor class with newly added function, add_i_partial(). - Apply BLAS operations in basic arithmetic operations in Tensor. - height-width transpose in half-precision can be SIMD accelerated. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>
This commit moves several operations implementations to each Tensor class for easier management. This allows users to create a new data type Tensor without unnecessary modification to the Tensor class. **Changes proposed in this PR:** - static function Tensor::cat() uses each tensor's member function concat(). - Tensor::copy() logic is simplified by not differentiating by its data type. - Tensor::copy_with_stride() uses an internal function to operate. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>
This PR fixes issues of undefined symbols of one of the tensor constructors. The function implementation is moved to the header file to resolve this issue. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>
This PR resolves warnings that occur during the Android build. The list is as follows. **Changes proposed in this PR:** - Fix function that overrides virtual functions but is not marked override. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>
469121c
to
7a021c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djeong20, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for great work!
This pull request introduces a major refactorization of the Tensor class in our codebase by introducing the new TensorV2 class.
The previous Tensor class is removed, and all instances where it was used have been updated accordingly.
With this change, the overall functionality and stability of our system are expected to be increased.
Self-evaluation: