-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : reduce hash table reset cost #8698
Conversation
It seems that older versions of gcc are not capable of understanding that code after
|
e0dbcdd
to
5fd4cef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To solve the implicit fallthrough warnings, maybe we can add GGML_ABORT
macro like this:
diff --git a/ggml/include/ggml.h b/ggml/include/ggml.h
index d0311f6c..527dadfb 100644
--- a/ggml/include/ggml.h
+++ b/ggml/include/ggml.h
@@ -272,7 +272,8 @@
#define GGML_NORETURN _Noreturn
#endif
-#define GGML_ASSERT(x) if (!(x)) ggml_abort(__FILE__, __LINE__, #x)
+#define GGML_ABORT(x) ggml_abort(__FILE__, __LINE__, #x)
+#define GGML_ASSERT(x) if (!(x)) GGML_ABORT(x)
// used to copy the number of elements and stride in bytes of tensors into local variables.
// main purpose is to reduce code duplication and improve readability.
diff --git a/ggml/src/ggml.c b/ggml/src/ggml.c
index 3fd9c4fe..a6cbb0ae 100644
--- a/ggml/src/ggml.c
+++ b/ggml/src/ggml.c
@@ -17924,7 +17924,7 @@ static void ggml_compute_backward(struct ggml_context * ctx, struct ggml_tensor
} break;
case GGML_UNARY_OP_TANH:
{
- GGML_ASSERT(false); // TODO: not implemented
+ GGML_ABORT("not implemented");
}
case GGML_UNARY_OP_ELU:
{
dccd09d
to
cdc02b7
Compare
While working on this, I also found that building the graphs in the
(*) add tensors to the list of nodes as they are created, rather than doing a DFS in So the conclusion is that this could be improved significantly, reducing the need of complicated logic to reuse the graphs as in #8366. Simply removing the calls to |
8ccf979
to
ae5331d
Compare
* ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string
Reduces the reset cost of hash tables by using a bit table to indicate if a slot is in use, instead of a
NULL
pointer. In this way, the amount of memory that needs to be cleared when resetting a hash table is reduced by a factor of 64 (32 in 32-bit systems). The difference in performance is only measurable in small models, but with very small models like stories260K, it can be significant. Since the CUDA backend support async computation, the hash table resets could be done in parallel with the computation, hiding most of this latency, however, in the case of the stories260k model the reset was more expensive than the computation.Other changes included in this PR:
GGML_ASSERT
to move most of the code to an external function callggml_abort
to reduce code sizeGGML_ABORT
to abort the process with a messageGGML_ASSERT(false)
withGGML_ABORT
GGML_NORETURN
to tag functions that not return, used inggml_abort
GGML_ABORT
will generate an "unreachable code" warning in clang (gcc does not have this warning). I have fixed these warnings in a separate commit to simplify the review.-g
to all themake
release builds to include debug symbols. This should not affect performance, but will improve the accuracy of call stacks.ggml_backend_sched