Optimize MF model #544

rayrayraykk · 2023-03-15T10:01:02Z

~~Optimize MF model forward:~~

~~Before: (user_embedding * item_embedding) [mask]~~
~~Now: (user_embedding[user_mask].to_sparse() * item_embedding[item_mask].to_sparse()[mask]~~

Optimization of MF

Convert MF model to nn.Embedding Version: forward with used embeddings
Saving client data to processed_data.pkl to save time when repeating the exp.

rayrayraykk · 2023-03-15T12:06:58Z

~~A CUDA Error caused by sparse tensor asserts, and see pytorch/pytorch#68323 for details (This cannot be fixed in torch1.10 version.)~~

joneswong · 2023-03-17T01:17:26Z

federatedscope/mf/model/model.py

-                         dtype=torch.float32))
-        self.register_parameter('embed_item', self.embed_item)
+        self.num_user, self.num_item = num_user, num_item
+        self.embed_user = torch.nn.Embedding(num_user, num_hidden, sparse=True)


I am wondering why sparse is needed?

Me too. In my opinion, the label matrix for MF task is very sparse, but the user/item embedding and prediction results should be dense.

See this discussion for details.

joneswong · 2023-03-17T01:19:19Z

federatedscope/mf/model/model.py

-                                       dtype=torch.float32).to_dense()
-
-        return mask * pred, label, float(np.prod(pred.size())) / len(ratings)
+        device = self.embed_user.weight.device


what indices and ratings are should be stated

joneswong · 2023-03-17T01:21:06Z

federatedscope/mf/model/model.py

+
+        label = torch.tensor(np.array(ratings)).to(device)
+
+        return pred, label


when do you calculate these for negative examples?

It is OK to me to just fit the observed entries, but it is a very naive baseline approach I guess

joneswong · 2023-03-17T01:21:48Z

@DavdGao hi dawei, please help us check the changes about datasets. Thanks!

DavdGao

Please see the inline comments

DavdGao · 2023-03-17T02:16:20Z

federatedscope/mf/dataset/movielens.py

+
+        self.processed_data = os.path.join(self.root, self.base_folder,
+                                           'processed_data.pkl')
+        if os.path.exists(self.processed_data):


I don't think line 104-105 will be executed, since a new exp directory (self.root) will be created for each time.

self.root = 'data/'

DavdGao · 2023-03-17T02:20:41Z

federatedscope/mf/model/model.py

-                         dtype=torch.float32))
-        self.register_parameter('embed_item', self.embed_item)
+        self.num_user, self.num_item = num_user, num_item
+        self.embed_user = torch.nn.Embedding(num_user, num_hidden, sparse=True)


What's the difference between torch.nn.Embedding and torch.normal, and is torch.nn.Embedding much faster than torch.normal?

nn.Embedding can be used for matmul with used weight:
E.g.:
user_emb(users_idx) * item_emb(item_idx): (n, d) * (d, n)
But the raw implementation is (N, d) * (d, N)

DavdGao · 2023-03-17T02:23:36Z

federatedscope/mf/model/model.py

-                         dtype=torch.float32))
-        self.register_parameter('embed_item', self.embed_item)
+        self.num_user, self.num_item = num_user, num_item
+        self.embed_user = torch.nn.Embedding(num_user, num_hidden, sparse=True)


Me too. In my opinion, the label matrix for MF task is very sparse, but the user/item embedding and prediction results should be dense.

DavdGao · 2023-03-17T02:27:18Z

federatedscope/mf/model/model.py

+        user_embedding = self.embed_user(indices[0])
+        item_embedding = self.embed_item(indices[1])
+
+        pred = torch.diag(torch.matmul(user_embedding, item_embedding.T))


Why pred is calculated by torch.diag here?

Update to :pred = (user_embedding * item_embedding).sum(dim=1)

joneswong

approved.

optimize MF model

0a98038

rayrayraykk added the enhancement New feature or request label Mar 15, 2023

rayrayraykk requested review from joneswong and DavdGao March 15, 2023 10:01

rayrayraykk closed this Mar 15, 2023

fix minor bugs

7d588ff

rayrayraykk reopened this Mar 15, 2023

rayrayraykk added 5 commits March 15, 2023 21:46

refactor mf model

50cf77d

add processed file for data

cc9aef9

add global eval for mf

ea37fb7

fix minor bugs

a1c5f81

modify comments

7986f2b

joneswong reviewed Mar 17, 2023

View reviewed changes

DavdGao reviewed Mar 17, 2023

View reviewed changes

Update model.py

e20ee38

joneswong approved these changes Mar 17, 2023

View reviewed changes

joneswong merged commit c6a7de4 into alibaba:master Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize MF model #544

Optimize MF model #544

rayrayraykk commented Mar 15, 2023 •

edited

Loading

rayrayraykk commented Mar 15, 2023 •

edited

Loading

joneswong Mar 17, 2023

DavdGao Mar 17, 2023

rayrayraykk Mar 17, 2023

joneswong Mar 17, 2023

joneswong Mar 17, 2023

joneswong Mar 17, 2023

joneswong commented Mar 17, 2023

DavdGao left a comment

DavdGao Mar 17, 2023

rayrayraykk Mar 17, 2023

DavdGao Mar 17, 2023

rayrayraykk Mar 17, 2023

DavdGao Mar 17, 2023

DavdGao Mar 17, 2023

rayrayraykk Mar 17, 2023 •

edited

Loading

joneswong left a comment


		label = torch.tensor(np.array(ratings)).to(device)

		return pred, label

Optimize MF model #544

Optimize MF model #544

Conversation

rayrayraykk commented Mar 15, 2023 • edited Loading

rayrayraykk commented Mar 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joneswong commented Mar 17, 2023

DavdGao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rayrayraykk Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

joneswong left a comment

Choose a reason for hiding this comment

rayrayraykk commented Mar 15, 2023 •

edited

Loading

rayrayraykk commented Mar 15, 2023 •

edited

Loading

rayrayraykk Mar 17, 2023 •

edited

Loading