[MXNET-96] Language model with Google's billion words dataset #10025

eric-haibin-lin · 2018-03-07T14:27:40Z

Description

This example reproduces the result (~42 perplexity) on Exploring the Limits of Language Modeling on the GBW dataset.
See readme.mk for details.
@mli @szha @zheng-da @piiswrong

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

Language model with Google's billion words dataset (#197)

TaoLv · 2018-03-07T14:52:21Z

src/operator/nn/fully_connected.cc

@@ -214,9 +238,7 @@ If ``no_bias`` is set to be true, then the ``bias`` term is ignored.
 .set_attr<nnvm::FInferShape>("FInferShape", FullyConnectedShape)
 .set_attr<nnvm::FInferType>("FInferType", FullyConnectedType)
 .set_attr<FCompute>("FCompute<cpu>", FullyConnectedCompute<cpu>)
-#if MXNET_USE_MKLDNN == 1


Why remove this?

This is for adding sparse support for FC. See line 213.

I see. Both sparse and mkldnn support need run into this dispatch.

TaoLv · 2018-03-07T14:56:41Z

src/operator/nn/fully_connected.cc

+  std::vector<TBlob> out_blobs(outputs.size());
+  for (size_t i = 0; i < out_blobs.size(); i++) out_blobs[i] = outputs[i].data();
+  FullyConnectedCompute<cpu>(attrs, ctx, in_blobs, req, out_blobs);
+#endif


I think ‘FallBackCompute’ should be used to fall back the computation to original cpu implementation.

This block will only be executed when MKL is absent

why do you not use FallBackCompute for fallback?
If an input is a sparse matrix, does data() return a dense ndarray? it doesn't seem SetTBlob is doing it.

Does MKL support kFComputeFallback dispatch mode?
Are you both referring to line 93 - line 99? FallBackCompute is only defined when USE_MKL=1. Can I still use it?
What I need to address is the following case for inference:

data = dense

weight = rowsparse

bias = rowsparse

output = dense
But I don't know how to deal with this efficiently with USE_MKL=1.

When USE_MKL=1,
I wanted to simply use the non-MKL FCForward with

data.data(), weight.data(), bias.data(),

assuming data.data() returns a TBlob with normal cpu layout even if data is in MKL layout. weight.data() and bias.data() should always return normal cpu layout if weight and bias are row_sparse. Please advice.

eric-haibin-lin · 2018-03-08T07:28:07Z

@zihaolucky

zheng-da · 2018-03-08T21:59:38Z

src/operator/nn/fully_connected.cc

+#endif
+  if (valid_data && valid_weight && valid_bias && valid_out) {
+    std::vector<TBlob> in_blobs(inputs.size());
+    for (size_t i = 0; i < in_blobs.size(); i++) in_blobs[i] = inputs[i].data();


Actually, if an NDArray uses MKLDNN format, data() will convert its layout inside the array. This caused a race condition, I just fixed. You shouldn't call data() on an NDArray with MKLDNN format. Please check FallBackCompute to see how it is handled correctly.

I also don't understand what happens if the input array is a row sparse. If the row sparse array doesn't have zero-entry rows, it works fine. What is the row sparse array have zero-entry rows, isn't the memory returned from data() is smaller than we expect?

So, the data() should be improved functionality. The developer doesn't know there's a potential issue when working with other formats of NDArray.

I guess the only option is to disable data() for NDArrays with MKLDNN format.

cjolivier01

Please create and link a JIRA ticket

sxjscience · 2018-03-09T00:41:21Z

@eric-haibin-lin The RNN.unroll is just updated to support variable length sequence. Need to resolve the conflict.

sxjscience · 2018-03-09T20:12:25Z

python/mxnet/gluon/contrib/rnn/rnn_cell.py

@@ -181,3 +181,126 @@ def unroll(self, length, inputs, begin_state=None, layout='NTC', merge_outputs=N
        outputs, _, _, _ = _format_sequence(length, outputs, layout, merge_outputs)

        return outputs, states
+
+
+class LSTMPCell(HybridRecurrentCell):


Can this be implemented as a ModifierCell? The implementation should be similar as ZoneOut.

Is there an easy way to modify state in Modifier cell (especially during unroll)??

I don't it's a good abstraction as a modifier cell. The projection only happens on the hidden state of LSTM, not on the cell state. I am not sure what's the expected behavior for GRU cell and RNN cell where the number of states is just one. Also the paper call this LSTMP, it's specific to LSTM. I think it's find to just call it contrib.LSTMP which inherits HybridRecurrentCell .

sxjscience · 2018-03-09T20:14:08Z

src/operator/nn/fully_connected.cc

@@ -56,7 +56,10 @@ static bool FullyConnectedShape(const nnvm::NodeAttrs& attrs,
  }
  SHAPE_ASSIGN_CHECK(*in_shape, fullc::kWeight, Shape2(param.num_hidden, num_input));
  if (!param.no_bias) {
-    SHAPE_ASSIGN_CHECK(*in_shape, fullc::kBias, Shape1(param.num_hidden));
+    if (!shape_assign(&(*in_shape)[fullc::kBias], Shape1(param.num_hidden)) &&
+        !shape_assign(&(*in_shape)[fullc::kBias], Shape2(param.num_hidden, 1))) {


Why should we check it against (num_hidden, 1).

The bias are trained with sparse embedding with requires a 2-D shape.

szha · 2018-03-12T01:38:19Z

example/rnn/large_word_lm/data.py

+    """ A dataset for truncated bptt with multiple sentences.
+        Adapeted from @rafaljozefowicz's implementation.
+     """
+    def __init__(self, vocab, file_pattern, deterministic=False):


shuffle instead of deterministic, since a fixed random seed may still produce deterministic but shuffled results.

szha · 2018-03-12T01:45:29Z

example/rnn/large_word_lm/model.py

+def cross_entropy_loss(inputs, labels, rescale_loss=1):
+    """ cross entropy loss """
+    criterion = mx.gluon.loss.SoftmaxCrossEntropyLoss()
+    loss = criterion.hybrid_forward(S, inputs, labels)


loss = criterion(inputs, labels) should do. Also, rescale_loss can be put in the constructor call of the SoftmaxCELoss using argument weight

szha · 2018-03-12T01:51:57Z

src/operator/nn/fully_connected-inl.h

+    Tensor<xpu, 1, DType> bias = in_data[fullc::kBias].get_with_shape<xpu, 1, DType>(
+      Shape1(wmat.shape_[0]), s);
+    CHECK_EQ(bias.shape_[0], wmat.shape_[0])
+      << "bias.data().shape[0] != weight.data().shape[0]. Not supported by FCForward";


Use words to describe the error instead. Also, if flatten is True the error message for data.data() might not make sense.

szha · 2018-03-13T00:42:26Z

example/rnn/large_word_lm/data.py

+    """
+    def __init__(self):
+        self._token_to_id = {}
+        self._token_to_count = {}


collections.Counter?

cjolivier01

Please add a JIRA

JIRA ticket was added

eric-haibin-lin · 2018-03-17T02:41:16Z

@TaoLv @zheng-da could you review the updated code for FC?

zheng-da · 2018-03-21T17:58:33Z

src/operator/nn/fully_connected.cc

+    // inputs
+    std::vector<TBlob> in_blobs(inputs.size());
+    auto get_data = [](const NDArray& nd) -> TBlob {
+      if (nd.storage_type() == kDefaultStorage) return nd.Reorder2Default().data();


nd.Reorder2Default() returns a temp object here, which controls a piece of memory. This memory will be freed when the function returns, and TBlob will reference to a free'd memory.

eric-haibin-lin · 2018-03-21T22:44:34Z

Anything else to address?
@sxjscience @zheng-da @szha @cjolivier01 @TaoLv

zheng-da · 2018-03-22T00:44:17Z

looks good to me

…#10025) * Language model with Google's billion words dataset (apache#197) Language model with Google's billion words dataset (apache#197) * fix lint * ffix license * patch * fix lint * cr comment * update fc fallback * fix build * fix temp memory in fc * fix compilation

Language model with Google's billion words dataset (#197)

3ddfd2e

Language model with Google's billion words dataset (#197)

eric-haibin-lin requested review from cjolivier01 and szha as code owners March 7, 2018 14:27

Ubuntu added 2 commits March 7, 2018 14:42

fix lint

906aa23

ffix license

6bf36c5

TaoLv reviewed Mar 7, 2018

View reviewed changes

szha self-assigned this Mar 8, 2018

patch

cd36ba9

fix lint

413c463

zheng-da reviewed Mar 8, 2018

View reviewed changes

cjolivier01 suggested changes Mar 8, 2018

View reviewed changes

eric-haibin-lin requested a review from sxjscience March 9, 2018 00:01

sxjscience reviewed Mar 9, 2018

View reviewed changes

szha reviewed Mar 12, 2018

View reviewed changes

szha reviewed Mar 13, 2018

View reviewed changes

cjolivier01 previously requested changes Mar 13, 2018

View reviewed changes

eric-haibin-lin changed the title ~~Language model with Google's billion words dataset~~ [MXNET-96] Language model with Google's billion words dataset Mar 13, 2018

Ubuntu added 3 commits March 16, 2018 22:45

cr comment

4b601dd

merge

ccd894f

update fc fallback

4d5f9a4

fix build

148e345

eric-haibin-lin force-pushed the language-model branch from 22edc00 to 148e345 Compare March 17, 2018 04:47

zheng-da reviewed Mar 21, 2018

View reviewed changes

Ubuntu added 2 commits March 21, 2018 18:09

fix temp memory in fc

a00b74f

fix compilation

6bea4b2

sxjscience approved these changes Mar 21, 2018

View reviewed changes

szha merged commit 57534bf into apache:master Mar 22, 2018

eric-haibin-lin deleted the language-model branch September 2, 2019 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-96] Language model with Google's billion words dataset #10025

[MXNET-96] Language model with Google's billion words dataset #10025

eric-haibin-lin commented Mar 7, 2018 •

edited

Loading

TaoLv Mar 7, 2018

eric-haibin-lin Mar 7, 2018

TaoLv Mar 7, 2018

TaoLv Mar 7, 2018

eric-haibin-lin Mar 7, 2018

zheng-da Mar 8, 2018

eric-haibin-lin Mar 8, 2018

eric-haibin-lin Mar 8, 2018

eric-haibin-lin commented Mar 8, 2018

zheng-da Mar 8, 2018

pengzhao-intel Mar 9, 2018

zheng-da Mar 9, 2018

cjolivier01 left a comment

sxjscience commented Mar 9, 2018 •

edited

Loading

sxjscience Mar 9, 2018

szha Mar 12, 2018

eric-haibin-lin Mar 15, 2018

eric-haibin-lin Mar 16, 2018

sxjscience Mar 9, 2018

eric-haibin-lin Mar 15, 2018

szha Mar 12, 2018

eric-haibin-lin Mar 15, 2018

szha Mar 12, 2018

eric-haibin-lin Mar 15, 2018

szha Mar 12, 2018

szha Mar 13, 2018

eric-haibin-lin Mar 15, 2018

cjolivier01 left a comment

eric-haibin-lin commented Mar 17, 2018

zheng-da Mar 21, 2018

eric-haibin-lin commented Mar 21, 2018

zheng-da commented Mar 22, 2018

[MXNET-96] Language model with Google's billion words dataset #10025

[MXNET-96] Language model with Google's billion words dataset #10025

Conversation

eric-haibin-lin commented Mar 7, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-haibin-lin commented Mar 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjolivier01 left a comment

Choose a reason for hiding this comment

sxjscience commented Mar 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjolivier01 left a comment

Choose a reason for hiding this comment

eric-haibin-lin commented Mar 17, 2018

Choose a reason for hiding this comment

eric-haibin-lin commented Mar 21, 2018

zheng-da commented Mar 22, 2018

eric-haibin-lin commented Mar 7, 2018 •

edited

Loading

sxjscience commented Mar 9, 2018 •

edited

Loading