Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[RoadMap] Legacy issue resolution before 1.0 release #7319

Closed
piiswrong opened this issue Aug 2, 2017 · 31 comments
Closed

[RoadMap] Legacy issue resolution before 1.0 release #7319

piiswrong opened this issue Aug 2, 2017 · 31 comments
Labels

Comments

@piiswrong
Copy link
Contributor

piiswrong commented Aug 2, 2017

We are working on multiple new features and refactors (gluon, sparse, engine, etc) towards an 1.0 release. But there are also some legacy issues that needs to be resolved. Here is a list of issues I have noted. Feel free to raise new issues or contribute fixes.

@mli @tqchen @eric-haibin-lin @reminisce @asmushetzel @jermainewang @ptrendx

Basic

  • Sort out int32, int64, index_t, mx_uint etc.
    Currently TShape uses int64_t but the front-end and back-end interface still use uint32_t for indices.
    This need cleaning up and int64_t interface need to be exposed through a new set of CAPI. The general policy going forward should be to use int64_t for indices/size and int32_t for number of dimensions. Signed int should be used for function arguments unless there is a really strong reason to use unsigned.
    Most indexing related interface should support negative indexing.
  • Deprecate mshadow.
    Remove usage of mshadow template evaluations and replace with Kernel::Launch or hand written cpu/gpu kernels.
  • Verify type and shape support for operators.
    Currently some operators don't support types other than fp32 for legacy reasons. Proper type support and/or documentation should be added.
    Currently some operators have limit support for tensor ranks (maximum 5 dims). The limit needs to be increased or removed if possible.
  • move legacy operators to new nnvm registration.
    Conv, FC, BN etc should be refactored into stateless operators and use thread_local to store cudnn tensor descriptors.
  • remove mkl experimental and use the new storage type interface.

Stretch goals

  • Support pybind11 or cython interface for faster python API.
  • Use variant type instead of string for operator argument parsing.
@feiyulv
Copy link

feiyulv commented Aug 3, 2017

should check kAddTo support for operators

@piiswrong
Copy link
Contributor Author

I saw a few complaints that MXNet doesn't support IDE code completion due to operator registration.
one solution is to generate an operator file instead of creating it everytime

@jmacglashan
Copy link

jmacglashan commented Aug 4, 2017

Yes, an operator file (or otherwise) to support IDE code completion would be greatly welcomed.

@piiswrong
Copy link
Contributor Author

also need to change all the DType(expf()) etc to use math function with proper precision

@szha
Copy link
Member

szha commented Aug 5, 2017

Default epsilon in Symbol/NDArray batch norm are too large (1e-3). Gluon now uses 1e-5, which is more commonly used.

@eric-haibin-lin
Copy link
Member

kvstore has a new str interface, while the updater always uses int as the key, which is not consistent. https://github.com/apache/incubator-mxnet/blob/master/src/kvstore/kvstore_local.h#L83

@jmacglashan
Copy link

I think the biggest feature mxnet lacks is the higher order gradients (see #5699). This is probably a fairly substantial feature, but is there any plan for this or Hessian-vector products for 1.0?

@ptrendx
Copy link
Member

ptrendx commented Aug 8, 2017

For me the biggest feature mxnet lacks is consistent and full documentation and tutorials. Gluon tutorial seems to be pretty awesome (although still incomplete), but the rest of the API does not have such good treatment. It got even worse once you removed most examples from the website (even though I agree that they were not well explained).
From the technical and performance point of view MXNet is a great (and probably the best actually) but it's hard to take off when others have lower barrier of entry and spend a lot on PR.

@ZihengJiang
Copy link
Contributor

Should enable multiple times resource requests

@piiswrong
Copy link
Contributor Author

piiswrong commented Aug 14, 2017

@ptrendx @madjam @bhavinthaker The removed tutorials need to be brought back ASAP!

@asmushetzel
Copy link
Contributor

Should we also work on error handling? Basically getting more useful and more consistent messages when a model not build correctly by the user (shape inference fails etc).

@szha
Copy link
Member

szha commented Aug 23, 2017

Ops that are differentiable are missing gradients. (e.g. 'norm')

@jaanli
Copy link

jaanli commented Sep 7, 2017

+1 on higher-order gradients #5699

@madjam
Copy link
Contributor

madjam commented Sep 7, 2017

Create appropriate namespaces so that APIs are grouped logically and do not end up with prefix qualifiers such as linalg_ , random_ etc.

@szha
Copy link
Member

szha commented Sep 7, 2017

@madjam this is already worked on by @reminisce and @eric-haibin-lin

@madjam
Copy link
Contributor

madjam commented Sep 7, 2017

@szha thanks. Is it being tracked in a separate issue?

@szha
Copy link
Member

szha commented Sep 7, 2017

@madjam I think it's already merged.

@reminisce
Copy link
Contributor

@madjam Namespace refactoring is covered in this PR. #7604
@eric-haibin-lin may have more coverage for documentation.

@eric-haibin-lin
Copy link
Member

@madjam the docs for separate namespace is merged in #7712
@piiswrong could you update the task status so that ppl are aware which ones have been assigned / done?

@formath
Copy link
Contributor

formath commented Sep 13, 2017

Embedding op should be optimized for large sparse id. Now, the embedding layer use the input id as the raw index of embedding matrix. In some circumstance, id may be generated using uint64 hash so not suitable. This feature is much needed in industrial click through rate prediction, recommendation system and other uses.
Maybe, embedding matrix should be like this and partitioned to the server nodes of ps using sparse_id like that of tensorflow.

sparse_id1 vector
sparse_id2 vector
...
...

@eric-haibin-lin
Copy link
Member

@formath you bring up a good point. Large indices is definitely a feature we want to support in the long-term. We might want to open a separate issue and discuss this.

First of all, we do plan to add sparse support for Embedding op, where the weight can be in row_sparse format, and the gradient for the weight should be generated in row_sparse format, too. I am currently working on code refactoring and documentations so this sparse operator is not implemented yet.

Regarding large indices up to 64 bits, this requires the first task @piiswrong brought up regarding int types in the C API, and the Kernel::Launch API in the backend uses 32-bit int instead of 64-bit, which is problematic for many operators which operate on ndarrays of large shape. So the scope is bigger than just the embedding op and definitely, it takes some more time to resolve.

Are you working on any industrial scale dataset? Two ways to circumvent the 64-bit hashed-index problem in my mind:

  1. rehashing the indices into around 23 or 24 bit to reduce the dimensionality, which doesn't hurt much as claimed by some paper, and doesn't cause the operator to break in MXNet.
  2. preprocessing the dataset to find out the number of unique features and map them to continuous indices instead.
    @formath what's your thought on this?

@formath
Copy link
Contributor

formath commented Sep 14, 2017

@eric-haibin-lin Both ok. But it does not solve the efficiency problem when the raw of embedding matrix is several millions or even billions because of the lack of sparse update. Those problems are the primary limits to use mxnet in industry. The sparse tensor support developed recently is a big progress. I think it and its mating part should be assigned a higher priority.

@eric-haibin-lin
Copy link
Member

@formath Yes, I'll work on the sparse embedding operator to support at least millions of features after I am done with basic documentations for sparse. We do have a few sparse optimizers like SGD and Adam. Ftrl and Adagrad are coming in #7720 #7903

@szha
Copy link
Member

szha commented Sep 21, 2017

It would be easier if this issue is converted to a github project so that item progresses can be tracked.

@szha
Copy link
Member

szha commented Sep 23, 2017

I have the impression that many ops don't respect grad_req.

@szha
Copy link
Member

szha commented Sep 27, 2017

Many examples are outdated or don't uphold the style standard. Duplicates of the same or similar (most popular being MNIST dataset) are omnipresent.

@szha
Copy link
Member

szha commented Sep 29, 2017

Certain convolution layouts on CPU are not supported though API claims them to be supported (e.g. NWC NHWC NDHWC).

@eric-haibin-lin
Copy link
Member

All examples should be runnable. We should have a check list for these

@szha
Copy link
Member

szha commented Oct 2, 2017

#2944 may have other open issues.

@taliesinb
Copy link
Contributor

@szha I'm wondering the same thing: the Convolution op explicitly does not support "NWC", for examlpe, but gluon mentions "NWC" in the docs. Searching the codebase shows that string only occurs in the high-level docs, so are the gluon docs simply wrong here?

@Godricly
Copy link
Contributor

Godricly commented Feb 5, 2018

@szha I met same issue as @taliesinb did using conv1d in mxnet(mxnet-cu80 (1.0.0.post2)
). The document is not matched with conv1d behavior.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests