-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[RoadMap] Legacy issue resolution before 1.0 release #7319
Comments
should check kAddTo support for operators |
I saw a few complaints that MXNet doesn't support IDE code completion due to operator registration. |
Yes, an operator file (or otherwise) to support IDE code completion would be greatly welcomed. |
also need to change all the DType(expf()) etc to use math function with proper precision |
Default epsilon in Symbol/NDArray batch norm are too large (1e-3). Gluon now uses 1e-5, which is more commonly used. |
kvstore has a new str interface, while the updater always uses int as the key, which is not consistent. https://github.com/apache/incubator-mxnet/blob/master/src/kvstore/kvstore_local.h#L83 |
I think the biggest feature mxnet lacks is the higher order gradients (see #5699). This is probably a fairly substantial feature, but is there any plan for this or Hessian-vector products for 1.0? |
For me the biggest feature mxnet lacks is consistent and full documentation and tutorials. Gluon tutorial seems to be pretty awesome (although still incomplete), but the rest of the API does not have such good treatment. It got even worse once you removed most examples from the website (even though I agree that they were not well explained). |
Should enable multiple times resource requests |
@ptrendx @madjam @bhavinthaker The removed tutorials need to be brought back ASAP! |
Should we also work on error handling? Basically getting more useful and more consistent messages when a model not build correctly by the user (shape inference fails etc). |
Ops that are differentiable are missing gradients. (e.g. 'norm') |
+1 on higher-order gradients #5699 |
Create appropriate namespaces so that APIs are grouped logically and do not end up with prefix qualifiers such as linalg_ , random_ etc. |
@madjam this is already worked on by @reminisce and @eric-haibin-lin |
@szha thanks. Is it being tracked in a separate issue? |
@madjam I think it's already merged. |
@madjam Namespace refactoring is covered in this PR. #7604 |
@madjam the docs for separate namespace is merged in #7712 |
Embedding op should be optimized for large sparse id. Now, the embedding layer use the input id as the raw index of embedding matrix. In some circumstance, id may be generated using uint64 hash so not suitable. This feature is much needed in industrial click through rate prediction, recommendation system and other uses.
|
@formath you bring up a good point. Large indices is definitely a feature we want to support in the long-term. We might want to open a separate issue and discuss this. First of all, we do plan to add sparse support for Embedding op, where the weight can be in row_sparse format, and the gradient for the weight should be generated in row_sparse format, too. I am currently working on code refactoring and documentations so this sparse operator is not implemented yet. Regarding large indices up to 64 bits, this requires the first task @piiswrong brought up regarding int types in the C API, and the Are you working on any industrial scale dataset? Two ways to circumvent the 64-bit hashed-index problem in my mind:
|
@eric-haibin-lin Both ok. But it does not solve the efficiency problem when the raw of embedding matrix is several millions or even billions because of the lack of sparse update. Those problems are the primary limits to use mxnet in industry. The sparse tensor support developed recently is a big progress. I think it and its mating part should be assigned a higher priority. |
It would be easier if this issue is converted to a github project so that item progresses can be tracked. |
I have the impression that many ops don't respect grad_req. |
Many examples are outdated or don't uphold the style standard. Duplicates of the same or similar (most popular being MNIST dataset) are omnipresent. |
Certain convolution layouts on CPU are not supported though API claims them to be supported (e.g. NWC NHWC NDHWC). |
All examples should be runnable. We should have a check list for these |
#2944 may have other open issues. |
@szha I'm wondering the same thing: the Convolution op explicitly does not support "NWC", for examlpe, but gluon mentions "NWC" in the docs. Searching the codebase shows that string only occurs in the high-level docs, so are the gluon docs simply wrong here? |
@szha I met same issue as @taliesinb did using conv1d in mxnet(mxnet-cu80 (1.0.0.post2) |
We are working on multiple new features and refactors (gluon, sparse, engine, etc) towards an 1.0 release. But there are also some legacy issues that needs to be resolved. Here is a list of issues I have noted. Feel free to raise new issues or contribute fixes.
@mli @tqchen @eric-haibin-lin @reminisce @asmushetzel @jermainewang @ptrendx
Basic
Currently TShape uses int64_t but the front-end and back-end interface still use uint32_t for indices.
This need cleaning up and int64_t interface need to be exposed through a new set of CAPI. The general policy going forward should be to use int64_t for indices/size and int32_t for number of dimensions. Signed int should be used for function arguments unless there is a really strong reason to use unsigned.
Most indexing related interface should support negative indexing.
Remove usage of mshadow template evaluations and replace with Kernel::Launch or hand written cpu/gpu kernels.
Currently some operators don't support types other than fp32 for legacy reasons. Proper type support and/or documentation should be added.
Currently some operators have limit support for tensor ranks (maximum 5 dims). The limit needs to be increased or removed if possible.
Conv, FC, BN etc should be refactored into stateless operators and use thread_local to store cudnn tensor descriptors.
Stretch goals
The text was updated successfully, but these errors were encountered: