Skip to content

Paddle Error Message Writing Specification (English Verison)

Chen Weihang edited this page Aug 27, 2021 · 9 revisions

Paddle Error Message Writing Specification


Paddle报错信息文案书写规范 (中文版)


Specification summary:

  • Section 1, the error document writing template, is a form of recommendation reference, depending on the situation, if you have a simple and more user-friendly way of writing, you can use it flexibly.
  • Section 2, mandatory specification entries, write rules for error messages that must be observed, the first three have been added to CI monitoring
  • Section 3, the error information specification sample library, is some existing PADDLE_ENFORCE extracted from Paddle, rewrite it as valid examples, easy to refer to
  • Appendix, when the specification is perfected in the follow-up, firstly clarify the basis and the content to be modified in the appendix, as the filing, then modify the content of the specification.

Additional instructions:

  1. During the implementation process, the specifications may find aspects that are not considered by the existing specifications, and need to be supplemented and improved in the implementation process. Please also give positive feedback.
  2. There are 12 types of errors in the current version of the specification. If you find a type of error that cannot be covered, you can apply for a supplement.
  3. The error information specification sample library, the richer the examples, the more reference value, I encourage you to add new examples.
  4. The specification matching situation is more complicated, and the writing method that conforms to the specification may be matched to be non-compliant. At that time, please look for chenwhql (Chen Weihang).

1. Error message writing template

The prompt information of PADDLE_ENFORCE_* and PADDLE_THROW is recommended to be written according to the following structure:

注:Note: The key to the error message is to describe the error clearly. The template is for reference only.

Three-stage error document writing (error - expected - suggestion)

The first paragraph: indicate the error (must write)

  • Direct statement error:

    • Recommended description:
      • A is error, B is not initialized, C does not exist, D value is incorrect, E does not match, etc.
        • example: Mismatched label shape.
    • Deprecated description: What should A be, B should not be how
      • Something went wrong, first tell the user directly the error
      • Unless necessary, it is not recommended to point out the error in a tone that should/should not be
      • What should or should not be, the content of the second paragraph that explains the desired outcome
  • Note in this paragraph:

    1. The attribute variable should indicate the wrong body. For example, the Op input and output should indicate which Op input and output error is wrong, and distinguish the front reverse Op.
    2. Specifying the error is telling the user a fact. Generally, the magic number (a number with unknown meaning) is not allowed to be expressed in English sentences.

Second paragraph: Comparison of expected and actual values (provided as much as possible)

  • Write out what the input is expected here, and what the actual input is.

    • example: Expected labels dimension=1. Received 4.
  • Note in this paragraph:

    1. Provide the necessary information to complete, such as Shape error, need to compare the specific Shape output, and indicate the dimension of the error
    2. This paragraph can be omitted if the error in the first paragraph is a single value description. For example, A is a null pointer, B does not exist, there is no need to indicate here that expectation A is not empty, B should exist, etc.

Third paragraphs: Suggestions for revision (as far as possible)

  • Explain what caused the error here and how it should be modified

    • example: Suggested Fix: If your classifier expects one-hot encoding label,check your n_classes argument to the estimatorand/or the shape of your label.Otherwise, check the shape of your label.
  • Note in this paragraph:

    • It can be written that the modification proposal is generally applicable to some common problems, such as
      • Startup_program is not executed
      • An important parameter is not set
      • There may be a problem with an environment configuration

2. Mandatory specification entries

The PADDLE_ENFORCE_* and PADDLE_THROW tips must be written in the following entries:

1. Omitted or empty strings are not allowed (CI has monitoring)

  • The error examples:
PADDLE_ENFORCE(ctx->HasInput("X"));

PADDLE_ENFORCE(ctx->HasInput("X"), "");

2. Do not allow prompts to be too short, at least 20 characters longer (CI has monitoring)

  • The error examples:
PADDLE_ENFORCE(i != nullptr, "I must be set");

3. Must indicate the type of error (CI has monitoring)

  • There are currently 12 types of errors declared (see the detailed example in Section 3 for details).
    • InvalidArgument
    • NotFound
    • OutOfRange
    • AlreadyExists
    • ResourceExhausted
    • PreconditionNotMet
    • PermissionDenied
    • ExecutionTimeout
    • Unimplemented
    • Unavailable
    • Fatal
    • External

Usage summary: Wrap `platform::errors::ErrorType() outside the entire error prompt string (containing a list of variable length parameters)

A brief example (note the position of the parentheses):

  • Old: PADDLE_ENFORCE(true, "example: %s", str);
  • New: PADDLE_ENFORCE(true, platform::errors::InvalidArgument("example: %s", str));

The correct example:

PADDLE_ENFORCE_GT(y_dims.size(), y_num_col_dims,
                      platfrom::errors::InvalidArgument("The input tensor Y's dimensions of MulOp "
                      "should be larger than y_num_col_dims. But received Y's "
                      "dimensions = %d, Y's shape = [%s], y_num_col_dims = %d.",
                      y_dims.size(), y_dims, y_num_col_dims));

The error examples:

PADDLE_ENFORCE_GT(y_dims.size(), y_num_col_dims,
                      "The input tensor Y's dimensions of MulOp "
                      "should be larger than y_num_col_dims. But received Y's "
                      "dimensions = %d, Y's shape = [%s], y_num_col_dims = %d.",
                      y_dims.size(), y_dims, y_num_col_dims);

Note: PADDLE_ENFORCE under CUDA_ARCH does not yet support the declaration error type. If you encounter it, you can find the approver approve

4. Variable abbreviations defined by C++ developers are not allowed in prompts and should be expanded into full English words.

The error examples:

PADDLE_ENFORCE(forward_pd != nullptr,
               "Fail to find eltwise_fwd_pd in device context");

5. Make sure there are no syntax errors in the prompt

The error examples:

PADDLE_ENFORCE(context->HasInput("X"),
               "ArrayToLoDTensorOp must has input X."); //must has?

3. Error message valid sample library

Considering that developers have different understandings of the aforementioned standards, there may be doubts about the wrong classification. Therefore, as far as possible, examples of various types of errors are provided, as well as reference writing methods for related prompts. Developers are encouraged to optimize the error information. At this time, take the initiative to refer to the specification example here.

1. InvaliArgument

The user passed in an illegal parameter, including various parameter type errors, which should be the most common type of error.

1.1 ShapeError

PADDLE_ENFORCE_EQ(
    output_shape[unk_dim_idx] * capacity, -in_size,
    platform::errors::InvalidArgument(
        "The 'shape' attribute in ReshapeOp is invalid. "
        "The input tensor X'size must be divisible by known "
        "capacity of 'shape'. "
        "But received X's shape = [%s], X's size = %d, "
        "'shape' is [%s], known "
        "capacity of 'shape' is %d.",
        in_dims, in_size, framework::make_ddim(shape), capacity));

1.2 The parameter is empty (list is empty, null pointer, etc.)

PADDLE_ENFORCE_NE(vars.empty(), true, platform::errors::InvalidArgument(
                                          "Variable names are empty."));

1.3 The parameter is incorrect and is not equal to the expected value.

PADDLE_ENFORCE_GT(batch_size, 0, platform::errors::InvalidArgument(
                                    "Batch size %d is illegal.", batch_size));

PADDLE_ENFORCE_NE(
    num, 0,
    platform::errors::InvalidArgument(
        "The number of ids can not be zero, you need padding "
        "it in data generator, or if there is something wrong with "
        "the data, please check if the data contains unresolvable "
        "characters.\nplease check this error line: %s.",
        str));

1.4 Incorrect parameter format

PADDLE_ENFORCE_NE(in.format(), MKLDNNMemoryFormat::format_undef,
          platform::errors::InvalidArgument(
              "Input tensor format is invalid. Input tensor should "
              "have specified memory format."));

1.5 Parameter not initialized

PADDLE_ENFORCE_EQ(proto_->IsInitialized(), true,
                  platform::errors::InvalidArgument(
                      "Operator's Proto in op info is not initialized."));

PADDLE_ENFORCE_EQ(
    t->IsInitialized(), true,
    platform::errors::InvalidArgument(
        "The Tensor in the %s Op's Input Variable %s(%s) is "
        "not initialized.",
        Type(), name, ctx.Inputs(name).at(i)));

1.6 Incorrect parameter type

PADDLE_ENFORCE(
    tmp == *data_type || *data_type == dafault_data_type,
    platform::errors::InvalidArgument(
        "The DataType of %s Op's duplicable Variable %s must be "
        "consistent. The current variable type is (%s), but the "
        "previous variable type is (%s).",
        Type(), name, DataTypeToString(tmp),
        DataTypeToString(*data_type)));

PADDLE_ENFORCE_EQ(
    valid, true,
    platform::errors::InvalidArgument(
        "Tensor holds the wrong type, it holds %s, but desires to be %s.",
        DataTypeToString(type_),
        DataTypeToString(DataTypeTrait<T>::DataType())));

1.7 Parameter parsing error

PADDLE_ENFORCE_EQ(success, true,
                  platform::errors::InvalidArgument(
                      "Fail to parse DataFeedDesc from string: %s.",
                      data_feed_desc_str.c_str()));

1.8 LoD error

PADDLE_ENFORCE_GT(lod_level, 0, platform::errors::InvalidArgument(
                                    "Input(X) Tensor of SequencePoolOp "
                                    "does not contain LoD information."));

2. NotFound

The entity of the application cannot be found, the variable to be found is empty, the input and output do not exist, etc.

  • Separated from null pointers, variables not found and variables not correctly assigned, are two levels of concept

2.1 Op input and output not found

PADDLE_ENFORCE_EQ(
    ctx->HasInput("X"), true,
    platform::errors::NotFound("Input(X) of MulOp is not found."));
PADDLE_ENFORCE_EQ(
    ctx->HasInput("Y"), true,
    platform::errors::NotFound("Input(Y) of MulOp is not found."));
PADDLE_ENFORCE_EQ(
    ctx->HasOutput("Out"), true,
    platform::errors::NotFound("Output(Out) of MulOp is not found."));

2.2 Missing node

PADDLE_ENFORCE_NOT_NULL(
    p, platform::errors::NotFound("subgraph has no node %s.", name.c_str()));

2.3 file not found

PADDLE_ENFORCE_GT(file_cnt, 0,
                  platform::errors::NotFound("Input file list is empty."));

2.4 other

PADDLE_ENFORCE_NOT_NULL(
    var_desc, platform::errors::NotFound("%s is not found.", var_name));

PADDLE_ENFORCE_NOT_NULL(
    proto_,
    platform::errors::NotFound("Operator's Proto has not been registered."));

3. OutOfRange

PADDLE_ENFORCE_LT(
    i, N, platform::errors::OutOfRange("Array index out of bounds."));

PADDLE_ENFORCE_GT(value, lower_bound_,
                  platform::errors::OutOfRange("Attribute GreaterThan check failed."));

4. AlreadyExists

The entity being found already exists, or some individuals that only allow a single instance are found, but multiple

PADDLE_ENFORCE_EQ(
    attrs_.count(attr_name), 0,
    platform::errors::AlreadyExists(
        "The attribute %s has been set in the graph.", attr_name));

PADDLE_ENFORCE_NE(Has(pass_type), true, 
    platform::errors::AlreadyExists(
        "Pass %s has been registered.", pass_type));

PADDLE_ENFORCE_LE(ins.size(), 1UL,
    platform::errors::AlreadyExists(
        "Operator %s's input %s should contain only one variable.", type_, name));
                    
PADDLE_ENFORCE_EQ(
    fused_var_set.count(fused_var_name), 0,
    platform::errors::AlreadyExists(
         "The fused variable already exists."));

5. PermissionDenied

The current operation is not allowed to be executed.

PADDLE_ENFORCE_NE(a, b, platform::errors::PermissionDenied(
                            "Cannot connect the same node in the graph."));

6. ResourceExhausted

PADDLE_THROW_BAD_ALLOC(platform::errors::ResourceExhausted(
    "\n\nOut of memory error on GPU %d. "
    "Cannot allocate %s memory on GPU %d, "
    "available memory is only %s.\n\n"
    "Please check whether there is any other process using GPU %d.\n"
    "1. If yes, please stop them, or start PaddlePaddle on another GPU.\n"
    "2. If no, please decrease the batch size of your model.\n",
    place_.device, string::HumanReadableSize(size), place_.device,
    string::HumanReadableSize(avail), place_.device));

PADDLE_THROW_BAD_ALLOC(platform::errors::ResourceExhausted(
     "\n\nOut of memory error on GPU %d. "
     "Cannot allocate %s memory on GPU %d, "
     "available memory is only %s.\n\n"
     "Please check whether there is any other process using GPU %d.\n"
     "1. If yes, please stop them, or start PaddlePaddle on another GPU.\n"
     "2. If no, please try one of the following suggestions:\n"
     "   1) Decrease the batch size of your model.\n"
     "   2) FLAGS_fraction_of_gpu_memory_to_use is %.2lf now, "
     "please set it to a higher value but less than 1.0.\n"
     "      The command is "
     "`export FLAGS_fraction_of_gpu_memory_to_use=xxx`.\n\n",
     gpu_id_, string::HumanReadableSize(size), gpu_id_,
     string::HumanReadableSize(avail), gpu_id_,
     FLAGS_fraction_of_gpu_memory_to_use));

7. PreconditionNotMet

The currently executed operation requires certain prerequisites to be met before it can be executed.

PADDLE_ENFORCE_NOT_NULL(
    mutex_for_pick_file_,
    platform::errors::PreconditionNotMet(
        "You should call SetFileListMutex before PickOneFile"));

PADDLE_ENFORCE_NOT_NULL(
    root_scope_,
    platform::errors::PreconditionNotMet(
        "root_scope should be set before creating thread scope."));

PADDLE_ENFORCE_NE(
    fetched_var_it, fetched_vars->end(),
    platform::errors::PreconditionNotMet(
        "Cannot find fetched variable(%s). Perhaps the main_program "
        "is not set to ParallelExecutor.",
        var_name));

PADDLE_ENFORCE_EQ(finish_start_, true,
                  platform::errors::PreconditionNotMet(
                      "Datafeed has not started running yet."));

PADDLE_ENFORCE_NE(framework::product(y_dims), 0,
                  platform::errors::PreconditionNotMet(
                      "The Input variable Y(%s) has not "
                      "been initialized. You may need to confirm "
                      "if you put exe.run(startup_program) "
                      "after optimizer.minimize function.",
                      ctx->Inputs("Y").front());

PADDLE_ENFORCE_NE(FLAGS_use_ngraph, true,
                  platform::errors::PreconditionNotMet(
                      "Please compile with NGRAPH first to use NGRAPH."));

8. ExecutionTimeout

The execution response time is too long, or the communication timed out.

The sample is not found yet and is pending addition.

9. Unimplemented

Not yet implemented or supported, but may be implemented later

PADDLE_ENFORCE_NE(iter, operations_.end(),
                  platform::errors::Unimplemented(
                      "Operation %s is not supported yet.", op_type));

PADDLE_ENFORCE_EQ(
    all_reduce_ops.size(), grads.size(),
    platform::errors::Unimplemented(
        "The number of all_reduce OpHandle is not equal to the "
        "number of grads. Maybe some gradients are sparse type, "
        "it is not supported currently."));

10. Unavailable

The current service is not available or the current operation cannot be performed.

10.1 IO error

PADDLE_ENFORCE_NE(file_descriptor, -1, platform::errors::Unavailable(
                                            "Cannot open file %s.", filename));

PADDLE_ENFORCE_EQ(fin.good(), true, platform::errors::Unavailable(
                                        "Cannot open file %s.", filename));

PADDLE_ENFORCE_EQ(
    file.is_open(), true,
    platform::errors::Unavailable("Can not open %s to add benchmark.", path));

11. Fatal

Unexpected, serious errors, such as segmentation errors.

Used to add try-catch to handle unexpected exceptions, which developers won't use for the time being.

12. External

PADDLE_ENFORCE_CUDA_SUCCESS(
    cudaEventCreate(&event_, cudaEventDisableTiming),
    platform::errors::External(
        "Create event failed in CUDADeviceContextAllocator"));

4. Specification updates and additions

1. Added OP_INOUT_CHECK macro for Op InferShape

-The input and output check of Op InferShape, the error report type and the error report information are very similar, but because the error report type has not been added before, they all need to be modified.

-A new inspection macro has been added to handle this type of inspection. The usage example is as follows:

OP_INOUT_CHECK (ctx-> HasInput (" X ")," Input "," X "," Mul ");

-Just pass the conditional expression, Input or Output, Op input and output name, Op name in turn.

-On the one hand, it simplifies the code and reduces the workload of everyone. On the other hand, it can ensure that all Op's input and output check error information is consistent, unify the various existing writing methods, and avoid grammatical problems.

-However, the use of this macro is not mandatory, and the original writing compliance is also possible

For details, see [PR23430] (https://github.com/PaddlePaddle/Paddle/pull/23430)

2. PADDLE_ENFORCE_CUDA_SUCCESS is upgraded to a special macro, no need to add an error type

-PADDLE_ENFORCE_CUDA_SUCCESS is used to deal with the errors of Cuda and its related libs, but the cause of this type of error environment is mostly, even if developers are required to write error information, it is more difficult to play a key role in prompting, using the error code and description provided by CUDA official is more Good choice, so we upgraded the macro

-When using this macro, simply write the relevant call of CUDA in it, no longer need to write the error type and error information, for example

` PADDLE_ENFORCE_CUDA_SUCCESS (platform :: dynload :: ncclAllReduce (       src_ptr, dst_ptr, src.numel (), nccl_dtype, ncclSum, comm, stream)); `

For details, see [PR23816] (https://github.com/PaddlePaddle/Paddle/pull/23816)

3. Added GET_DATA_SAFELY macro to handle the remaining non-compliant error reporting function Ref

-There are more than 100 places in Paddle that use the Ref function to check whether the pointer is empty, and prompt the illegal use of related information. Most of the places where Ref is used do not report error information, or write very short, and can not tell the user the true location of the error , So a new GET_DATA_SAFELY macro replaces Ref

-The GET_DATA_SAFELY macro is only recommended for obtaining the input and output in Op Variable or holding LoDTensor and SelectedRows in Variable. The macro will check whether the input pointer is empty. If it is empty, it will build an error message according to the template and input parameters and throw an exception , If it is not empty, it will return to change the pointer. Examples are as follows:

` GET_DATA_SAFELY (ctx.Input <LoDTensor> ("X"), "Input", "X", "Mul"); `

See [PR22997] for details (https://github.com/PaddlePaddle/Paddle/pull/22997)

4. Add BOOST_GET series macros and prohibit direct use of boost :: get

-boost :: get is an insecure third-party library function, get :: bad_get will be thrown directly when get fails, there will be no stack information, and there will be no error file and line number prompts, causing this Class problems are difficult to debug, so the newly defined BOOST_GET series of macros replaces more than 600 places in the paddle where boost :: get is used

-A total of three BOOST_GET macros have been added, as follows:

` -BOOST_GET (e.g. return int & or int *) -BOOST_GET_CONST (e.g. Return const int & or const int *) -BOOST_GET_MUTABLE (e.g. return int or int *) `

These three macros can satisfy the use of boost :: get in paddle

-CI has added a new rule that does not allow the use of boost :: get directly. You need to use the new BOOST_GET series macros. If you do not want to use the BOOST_GET series macros, you can add try-catch to boost :: get Designated person

For details, see [PR24175] (https://github.com/PaddlePaddle/Paddle/pull/24175)

5. Prohibit the use of LOG (FATAL)

-LOG (FATAL) can also throw errors while the program is running, but the errors thrown are difficult to read, there are many invalid information, and there is no key information such as the line number of the error file. LOG (FATAL) is currently used in more than 100 places in paddle , These are replaced by PADDLE_THROW, and the use of LOG (FATAL) is prohibited in CI

Appendix

Updates for the filing specification

1. Recent Paddle error message optimization changes (from 2021.01.12)

The main changes to the error format include two points:

  1. Further streamline the error stack, the 2.0 version hides the C++ stack by default, and can be restored by setting FLAGS_call_stack_level=2. In addition, the C++ stack is not hidden when a SegmentFault type serious error occurs, which is convenient for debugging.
  2. Replace the uncommon paddle.fluid.core_avx.EnforceNotMet with the python native exception type. The exception mapping relationship is as follows:
Paddle Exception mapping rules:
   -InvalidArgumentError -> ValueError
   -NotFoundError -> RuntimeError
   -OutOfRangeError -> IndexError
   -AlreadyExistsError -> RuntimeError
   -ResourceExhaustedError -> MemoryError
   -PreconditionNotMetError -> RuntimeError
   -PermissionDeniedError -> RuntimeError
   -ExecutionTimeoutError -> RuntimeError
   -UnimplementedError -> NotImplementedError
   -UnavailableError -> RuntimeError
   -FatalError -> SystemError
   -ExternalError -> OSError

1.1 Static image error format optimization

Example of Paddle error message before the change:

Paddle error message example before the change

Example of Paddle error message after the change:

Example of Paddle error message after change

The transition period of hiding the C++ stack retains the prompt on how to display the C++ stack (see above), which has been deleted in the 2.0 official version

1.2 Dynamic graph error format optimization

Example of Paddle error message before the change:

Example of Paddle error message before the change

Example of Paddle error message after the change:

Paddle error message example after modification

The transition period of hiding the C++ stack retains the prompt on how to display the C++ stack (see above), which has been deleted in the 2.0 official version

2. Recent Paddle error message optimization changes (from 2019.11.13)

Original Paddle error message example:

Original Paddle error message example

Example of Paddle error message after optimization:

Example of Paddle error message after optimization

3. Error type adding process

Take the error type UNKNOWN as an example:

Step 1: Add a new error code in paddle/fluid/platform/error_codes.proto

UNKNOWN = 13;

Step 2: Register the new error type in paddle/fluid/platform/s.h

REGISTER_ERROR(Unknown, UNKNOWN)

Step 3: Add a new error string to paddle/fluid/platform/errors.cc

Case paddle::platform::error::UNKNOWN:
       Return "UnknownError";
       Break;

Step 4: Use in the code

PADDLE_ENFORCE_EQ(flag, true, platform::errors::Unknown("example"));

4. Error type extension record

If the existing 12 error types cannot cover the errors encountered in the actual scene, you can apply for a new error type, which is clarified here.

  • Added error type name
  • Added error type application scenario description
  • Added error type PADDLE_ENFORCE example (not less than 3)
Clone this wiki locally