Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hexadecimal representation of negative integers #235

Closed
polyvertex opened this issue Nov 26, 2015 · 23 comments
Closed

Hexadecimal representation of negative integers #235

polyvertex opened this issue Nov 26, 2015 · 23 comments

Comments

@polyvertex
Copy link
Contributor

When using format string {:#08x} to display the 32bit integer value -2147023083 in hexadecimal, I get -0x7ff8f8eb. While I understand this result is mathematically correct, I was expecting to read 0x80070715 instead since I usually use hexadecimal output to get a more intuitive representation of a variable in memory. Is there an option (compile-time or runtime) to change this default behavior without having to explicitly cast my arguments?

In case it matters, I'm using release 1.1.0.

@vitaut
Copy link
Contributor

vitaut commented Nov 26, 2015

No, there is no such option, but there are some alternatives:

  • Use fmt::printf/fmt::sprintf which don't print the sign

    fmt::printf("%#08x", -2147023083); // prints 0x80070715
  • Add a helper function to do the cast (a bit easier than casting every time):

    unsigned UInt(int value) {
      return static_cast<unsigned>(value);
    }
    fmt::print("{:#08x}", UInt(-2147023083));

@polyvertex polyvertex changed the title Hexadecimal for negative integers Hexadecimal representation of negative integers Nov 26, 2015
@vitaut
Copy link
Contributor

vitaut commented Dec 1, 2015

@polyvertex Do any of the proposed alternatives work for you?

@polyvertex
Copy link
Contributor Author

The fmt::sprintf alternative doesn't blend well with the existing code and casting is something I would like to avoid while I admittedly was already doing that before posting here. I considered modifying cppformat to offer this feature as an option and eventually make a pull request if that would be of interest to anyone, but it appears this modification may be quite invasive.

Out of curiosity, how did you make the choice to behave differently than std::printf for that case? I essentially used wrappers over std::sprintf and friends to format strings before intensively migrating to your library so I'm not aware of how other formatting libraries behave. Hence my surprise when I stumbled upon this result.

@vitaut
Copy link
Contributor

vitaut commented Dec 2, 2015

Out of curiosity, how did you make the choice to behave differently than std::printf for that case?

fmt::format and fmt::print are based on Python's str.format and mostly follow its conventions:

>>> '{:x}'.format(-42)
'-2a'

while fmt::sprintf and fmt::printf follow std::printf's.

It's hard to say which one is the best. I think it's reasonable to format signed integers with a sign regardless of the base, but from your comments I see that the other option might be useful too.

One possibility is to pass a custom argument formatter as a template argument to BasicFormatter. Then you could easily implement your custom formatting functions that do the necessary conversions to unsigned, something like:

class CustomArgFormatter : public fmt::BasicArgFormatter<CustomArgFormatter, char>  {
 public:
  typedef fmt::BasicArgFormatter<CustomArgFormatter, char> Base;
  CustomArgFormatter(fmt::BasicFormatter<char, CustomArgFormatter> &f, fmt::FormatSpec &s, const char *)
  : fmt::BasicArgFormatter<CustomArgFormatter, char>(f.writer(), s) {}

  void visit_int(int value) {
    fmt::FormatSpec &spec = this->spec();
    if (spec.type() == 'x')
      visit_uint(value); // convert to unsigned and format
    else
      Base::visit_int(value);
  }
};

std::string format(const char *format_str, fmt::ArgList args) {
  fmt::MemoryWriter writer;
  fmt::BasicFormatter<char, CustomArgFormatter> formatter(args, writer);
  formatter.format(format_str);
  return writer.str();
}
FMT_VARIADIC(std::string, format, const char *)

This will require some changes to the library. I did a quick and dirty prototype to see that this is feasible and if you interested I can clean it up and push the changes.

@polyvertex
Copy link
Contributor Author

IMO the custom argument formatter thing would be great as it would save the library from the creation of an exotic option. This would perfectly fit my needs actually since I would also like to interpret the HASH_FLAG differently than the default behavior.

Would that be feasible to use this custom formatter along/integrated with a custom Writer as described in #140? This is the way I use cppformat mainly.

PS: I'm still amazed by the amount of time you spend maintaining and improving your library as well as offering close support. As a user, I just hope it won't become a bloat of features!

@vitaut
Copy link
Contributor

vitaut commented Dec 4, 2015

IMO the custom argument formatter thing would be great as it would save the library from the creation of an exotic option. This would perfectly fit my needs actually since I would also like to interpret the HASH_FLAG differently than the default behavior.

Cool, let's go with this option then. I've done some preliminary support on custom formatters, but still need to make argument formatter customizable.

Would that be feasible to use this custom formatter along/integrated with a custom Writer as described in #140?

BasicFormatter can use any writer derived from BasicWriter, so I don't see any problem with this.

Thanks for the nice feedback =). I do try to keep the library lean and avoid feature creep. Therefore providing a facility to implement some feature is preferred to adding this feature directly (unless it is something that is likely to be widely used).

@vitaut
Copy link
Contributor

vitaut commented Mar 19, 2016

Mostly done in 52f8906. Just need to move the argument formatter class from fmt::internal to fmt namespace and document the new functionality.

@vitaut
Copy link
Contributor

vitaut commented Mar 19, 2016

Here's a working example:

#include "format.h"

class TrumpFormatter : public fmt::internal::ArgFormatterBase<TrumpFormatter, char>  {
 public:
  typedef fmt::internal::ArgFormatterBase<TrumpFormatter, char> Base;

  TrumpFormatter(fmt::BasicFormatter<char, TrumpFormatter> &f, fmt::FormatSpec &s, const char *)
  : fmt::internal::ArgFormatterBase<TrumpFormatter, char>(f.writer(), s) {}

  void visit_cstring(const char *str) {
    if (std::strcmp(str, "Trump") == 0)
      str = "Someone With Tiny Hands";
    Base::visit_cstring(str);
  }
};

void trump_print(const char *format_str, fmt::ArgList args) {
  fmt::MemoryWriter writer;
  fmt::BasicFormatter<char, TrumpFormatter> formatter(args, writer);
  formatter.format(format_str);
  std::puts(writer.c_str());
}
FMT_VARIADIC(void, trump_print, const char *)

int main() {
  trump_print("{} has his \"Make America Great Again\" hats.", "Trump");
}

It prints:

Someone With Tiny Hands has his "Make America Great Again" hats.

@polyvertex
Copy link
Contributor Author

It feels like Christmas :)

@MrSapps
Copy link

MrSapps commented Mar 23, 2016

Edit: Solved earlier issue.

Can this work with {} style strings? The base class seems to be looking for printf style format strings?

@vitaut
Copy link
Contributor

vitaut commented Mar 23, 2016

This works with Python-like {} format strings (please see the example above). ArgFormatterBase handles common format specifiers while PrintfArgFormatter implements printf-specific adjustments. Since the two formats have a lot in common PrintfArgFormatter has very little work to do.

@MrSapps
Copy link

MrSapps commented Mar 24, 2016

Not sure what I was doing yesterday, it clearly works with {} strings as you say! I'm trying to do this:

void visit_int(int value)
    {
        if (spec().type() == 'X')
        {
            // Format all signed hex values as signed
            visit_uint(value);
        }
        else
        {
            Base::visit_int(value);
        }
    }

    void visit_uint(unsigned int value)
    {
        if (spec().type() == 'X')
        {
            // Always add "0x" if the user didn't specify it
            if (!spec().flag(fmt::HASH_FLAG))
            {
                mWriter << "0x";
            }

            // Pad with 0's up to the "size" of the type
            spec().width_ = sizeof(value) * 2;
            spec().fill_ = '0';
        }

        Base::visit_uint(value);
    }

This can work for int/unsigned int and __int64/unsigned _int64. However for short int and unsigned short int there is no visit_shortint function, is there a way to get the size of the type in bytes? I'd like this so I can set the spec().width = sizeof(value) * 2; correctly so that short int t = 0xde appears as 0x00DE instead of 0x000000DE.

@vitaut
Copy link
Contributor

vitaut commented Mar 24, 2016

short is promoted to int as in printf, so you can't check if the original value was short unfortunately.

vitaut added a commit that referenced this issue Apr 19, 2016
This allows providing custom argument formatters without relying on
internal APIs (#235).
@vitaut
Copy link
Contributor

vitaut commented Apr 20, 2016

BasicArgFormatter is now public and the example in #235 (comment) works, so there is no need to rely on internal APIs any more.

Here's the first stab at the docs: http://cppformat.github.io/dev/api.html#argument-formatters.

@vitaut
Copy link
Contributor

vitaut commented Apr 21, 2016

I think there should be enough information in http://cppformat.github.io/dev/api.html#argument-formatters now to implement custom argument formatters and there is an example that covers specifically hexadecimal representation of negative integers. Therefore I'm closing this.

@vitaut vitaut closed this as completed Apr 21, 2016
@polyvertex
Copy link
Contributor Author

This is great, really. I haven't had the chance to try it yet (neither I had the opportunity to ever implement the curiously recurring template pattern! which probably means I should think my designs twice) but according to your docs, all the ingredients I was missing and hopping for are there. You've made a user happy!

@vitaut
Copy link
Contributor

vitaut commented Apr 23, 2016

Thanks for the nice feedback. =)

@polyvertex
Copy link
Contributor Author

I finally had a chance to give it a try. It's exactly what I wished to achieve, thanks

@Leandros
Copy link

Leandros commented Oct 16, 2017

This is the wrong behaviour. Hexadecimal should never print negative. How can I reset the behaviour (without always casting to an unsigned type)?

@vitaut
Copy link
Contributor

vitaut commented Oct 21, 2017

Hexadecimal should never print negative.

@Leandros, why? It seems reasonable to expect signed integer to be formatted with a sign regardless of the base. Could you elaborate a bit on your use case.

How can I reset the behaviour (without always casting to an unsigned type)?

You can implement your own formatting function that formats signed integers without sign in hexadecimal as explained above.

@polyvertex
Copy link
Contributor Author

To help @Leandros a bit, here is an arg formatter I use:

// An fmt::BasicArgFormatter that forcefully casts integers to unsigned when
// the 'x' or 'X' type specifier is used.
template <typename Char, typename Spec = format_spec>
class arg_formatter : public ::fmt::BasicArgFormatter<arg_formatter<Char>, Char, Spec>
{
public:
    typedef ::fmt::BasicArgFormatter<arg_formatter<Char>, Char, Spec> SuperT;

    arg_formatter<Char, Spec>(
            ::fmt::BasicFormatter<Char, arg_formatter>& f,
            typename Spec& s, const Char* fmt)
        : SuperT(f, s, fmt) { }

    template <typename T>
    void visit_any_int(T value)
    {
        const char type_lc = this->spec().type() | 0x20; // lower case
        if (!std::is_unsigned<T>::value && (type_lc == 'x')) // || type_lc == 'b'))
        {
            typedef std::make_unsigned<T>::type unsigned_type;
            SuperT::visit_any_int<unsigned_type>(value);
        }
        else
        {
            SuperT::visit_any_int<T>(value);
        }
    }
};

@Leandros
Copy link

Leandros commented Oct 21, 2017

@vitaut Might just be me, but I connect hexadecimal output with printing the memory behind the variable. It, hence, makes zero sense to print the sign. And it also might come from me being used to use printf, where %x is doing exactly what I described:

#include <stdio.h>
int main(void) 
{
	int x = -255;
	printf("%x\n", x);
	return 0;
}

will output:

ffffff01

https://ideone.com/62jFyu

@vitaut
Copy link
Contributor

vitaut commented Oct 22, 2017

I don't have a very strong opinion on what the default behavior should be and will accept a PR to change it. However, since changing the default formatting may break clients' expectations, I suggest doing it in the std branch that will become the basis of the next major release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants