-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : remove GGML_MAX_NODES limit #567
Comments
my approach for this would be to create the graph for just 1 cell. For each time step, will simply update the input and states, call the ggml_graph_compute_with_ctx and get the output. That way, the graph's size doesn't grow with the number of timestep. Example: |
cc @PABannier |
Yes we discussed that in #467 ! But for large sequences, isn't the overhead of building the graph per cell slowing this down dramatically? @datduonguva Have you tried by building a general graph by setting |
@PABannier They build the graph only once and then for each iteration, they put different input in the input tensor and read the output from the output tensor and so on. I don't know if this is valid for all LSTM models, but this specific one has a fixed graph that does not change it's shape with time. This approach does not have the overhead of building the graph each time, but it does have an overhead of starting and stopping the thread pool for every iteration. |
Let's fix the If we take the ideas from #467 and drop the requirement for having graphs on the stack. struct ggml_cgraph {
int n_nodes;
int n_leafs;
// by default we work on the stack
struct ggml_tensor * nodes;
struct ggml_tensor * grads;
struct ggml_tensor * leafs;
void ** visited_hash_table;
// performance
int perf_runs;
int64_t perf_cycles;
int64_t perf_time_us;
}; We allow creation of new graphs only through GGML_API struct ggml_cgraph * ggml_new_graph (struct ggml_context * ctx, int n_nodes_max);
GGML_API struct ggml_cgraph * ggml_build_forward_ctx(struct ggml_context * ctx, struct ggml_tensor * tensor, int n_nodes_max); I.e. all graphs are always create inside a Deprecate // old
gf = ggml_build_forward(x);
// new
gf = ggml_build_forward_ctx(ctx, x, n_nodes_max); The proposed struct ggml_cgraph ggml_graph_view(struct ggml_cgraph * cgraph, int n0, int n1); I think leafs always have to be "viewed in full", i.e only the nodes/grads can be split? The hash table can be reused by the views I think? Probably OK to return graph views on the stack? Probably need to extend: GGML_API size_t ggml_graph_overhead(int n_nodes_max); |
Deprecate usually means that there would be a transition period where both APIs would work, but I am not sure that would be possible here. Do you actually mean that, or just removing
For the purposes of
The hashtable is only used to avoid revisiting nodes while expanding the graph, and views shouldn't be allowed to be expanded since they would just overwrite the parent's memory. I think it would be ok to just set the hash table to
It's probably OK, but it doesn't really solve anything and it may reduce options in the future. Ie. we may need to change something that would require dynamic memory allocation even on views. So I think the safer route would be to also allocate views in a context. Should we do something to avoid allocating the list of |
Remove it entirely. We will add comment in the header how to migrate so it is easier for 3rd party projects.
Hm, interesting.
OK
Yes, probably a flag that the user sets when they know that backward passes will be computed. |
Most of the changes to I propose keeping a simple // Graph interface
GGML_API void ggml_build_forward_expand (struct ggml_cgraph * cgraph, struct ggml_tensor * tensor);
GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep);
GGML_API struct ggml_cgraph * ggml_new_graph(struct ggml_context * ctx); // = ggml_new_graph_custom(GGML_GRAPH_DEFAULT_SIZE, true);
GGML_API struct ggml_cgraph * ggml_new_graph_custom(struct ggml_context * ctx, size_t size, bool grads);
GGML_API void ggml_graph_reset(struct ggml_cgraph * graph);
GGML_API struct ggml_cgraph * ggml_graph_view(struct ggml_context * ctx, struct ggml_cgraph * cgraph, int i0, int i1);
// Previously, graphs could be copied simply with the operator =, but this does not work anymore.
// Restore this functionality with these functions:
GGML_API void ggml_graph_cpy (struct ggml_cgraph * src, struct ggml_cgraph * dst);
GGML_API struct ggml_cgraph * ggml_graph_dup (struct ggml_context * ctx, struct ggml_cgraph * src); // ggml_graph_cpy(src, ggml_new_graph(...));
GGML_API size_t ggml_graph_overhead(); // = ggml_graph_overhead_custom(GGML_GRAPH_DEFAULT_SIZE, true);
GGML_API size_t ggml_graph_overhead_custom(size_t size, bool grads);
// Removed:
// GGML_API struct ggml_cgraph * ggml_build_forward_ctx (struct ggml_context * ctx, struct ggml_tensor * tensor, size_t size);
// GGML_API struct ggml_cgraph * ggml_build_backward_ctx(struct ggml_context * ctx, struct ggml_cgraph * gf, bool keep);
// Instead:
// struct ggml_cgraph * gf = ggml_new_graph(ctx);
// ggml_graph_reset(gf); // ggml_build_forward() also resets the graph, add this function to support this
// ggml_build_forward_expand(gf, tensor);
//
// structr ggml_cgraph * gb = ggml_new_graph(ctx);
// ggml_build_backward_expand(ctx, gf, gb, keep); I noticed that the graph reset functionality in |
By default, we probably don't want to have For the graph resetting in Below is a short description I wrote about the existing
The purpose of this function is to just set the gradient contents to 0. The Other than that, I think the API is good. |
I didn't realize that there is already a function called |
We should be able to create graphs with large number of nodes. One application is support for LSTM networks:
#467
The exact approach for implementing this is not currently clear, but some ideas and discussion is available in the linked issue above
The text was updated successfully, but these errors were encountered: