Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stylesheets: implement CSS selectors specificity #170

Merged
merged 3 commits into from
May 1, 2018

Conversation

poire-z
Copy link
Contributor

@poire-z poire-z commented May 1, 2018

There was already some code to process selectors by ascending order of specificity, but LVCssSelector._specificity was always 0.
We now add to it some weight depending of the kind of each LVCssSelectorRule it may include.
See #167 (comment)

I haven't tested that much nor looked much at the selector order navigation, I just added these simple rules and the 2 test cases in koreader/koreader#2841 seem ok:

Before => after

image => image

image
=>
image

There was already some code to process selectors by ascending order
of specificity, but LVCssSelector._specificity was always 0.
We now add to it some weight depending of the kind of each
LVCssSelectorRule it may include.
// and don't have a selector here.
switch (_type)
{
case cssrt_id: // E#id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing an indentation level here /whistles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, and I would have put the { on the switch line.
But as it is, it's just similar to the other code just above or below in that file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say don't copy the wrong indentation on the other code. But where?

Copy link
Member

@Frenzie Frenzie May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good here:

switch (n1) {
case 0:
buf<<(lUInt32) css_val_px;
buf<<(lUInt32) 1;
break;
case 1:
buf<<(lUInt32) css_val_px;
buf<<(lUInt32) 3;
break;
case 2:
buf<<(lUInt32) css_val_px;
buf<<(lUInt32) 5;
break;
case 3:
buf<<(lUInt32) css_val_px;
buf<<(lUInt32) 3;
break;
case 4:
buf<<(lUInt32) css_val_inherited;
buf<<(lUInt32) 0;
break;
default:break;
}

switch (i)
{
case 1: len[1] = len[0]; /* fall through */
case 2: len[2] = len[0]; /* fall through */
case 3: len[3] = len[1];
}

switch (i)
{
case 1: len[1] = len[0]; /* fall through */
case 2: len[2] = len[0]; /* fall through */
case 3: len[3] = len[1];
}

switch (sum) {
case 1:
{
buf<<(lUInt32) (prop_code | parse_important(decl));
buf<<(lUInt32) n1;
buf<<(lUInt32) n1;
buf<<(lUInt32) n1;
buf<<(lUInt32) n1;
}
break;
case 2:
{
buf<<(lUInt32) (prop_code | parse_important(decl));
buf<<(lUInt32) n1;
buf<<(lUInt32) n2;
buf<<(lUInt32) n1;
buf<<(lUInt32) n2;
}
break;
case 3:
{
buf<<(lUInt32) (prop_code | parse_important(decl));
buf<<(lUInt32) n1;
buf<<(lUInt32) n2;
buf<<(lUInt32) n3;
buf<<(lUInt32) n2;
}
break;
case 4:
{
buf<<(lUInt32) (prop_code | parse_important(decl));
buf<<(lUInt32) n1;
buf<<(lUInt32) n2;
buf<<(lUInt32) n3;
buf<<(lUInt32) n4;
}
break;
default:break;
}

I could go on, but I can't find what you're talking about. :-P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch (prop_code)
{
case cssd_display:
style->Apply( (css_display_t) *p++, &style->display, imp_bit_display, is_important );

switch (_type)
{
case cssrt_parent: // E > F
//
{

Copy link
Member

@Frenzie Frenzie May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. Anyway, both forms are present in the code and we both agree it should be indented so there's nothing to discuss really, is there. :-P

(No need to change the old stuff of course.)

@Frenzie
Copy link
Member

Frenzie commented May 1, 2018

Great work! :-)

@poire-z
Copy link
Contributor Author

poire-z commented May 1, 2018

Should I include that too in the not-yet-merged base PR, or wait a day?

@Frenzie
Copy link
Member

Frenzie commented May 1, 2018

Whichever works for you. Unless you want me to do the bumping, then it'll be tomorrow or the day after. :-P

@poire-z
Copy link
Contributor Author

poire-z commented May 1, 2018

OK, done.
Added _specificity to hash, in case one day we changed the weighting.

Weird that the compiler let me make that error (it compiled fine, but the hash changed at each page turn, probably because that ghost _weight thing was changing every time...) OK, I added a _weight initially, that I ended up not using... Cleaning up...

lUInt32 LVCssSelectorRule::getHash()
{
    lUInt32 hash = 0;
    hash = ( ( ( ( (lUInt32)_type * 31
        + (lUInt32)_id ) *31 )
        + (lUInt32)_weight ) *31 ) // there's no such _weight member!
        + (lUInt32)_attrid * 31 )
        + ::getHash(_value);
    return hash;
}

@poire-z
Copy link
Contributor Author

poire-z commented May 1, 2018

I think this will need a small fix, outside of LVCssSelectorRule. I forgot to add 1 for the main element (if any):
if the LVCssSelector has a non-zero _id (it has an element): add 1. So that the universal '*' can be 0 and have less specificity.

@poire-z
Copy link
Contributor Author

poire-z commented Jan 12, 2020

Some specificity bug and issue noticed while testing with https://www.brunildo.org/test/IEASpec.html.
Previous post / #171 added a +1, but we then should have removed another +1 elsewhere.

--- a/crengine/src/lvstsheet.cpp
+++ b/crengine/src/lvstsheet.cpp
@@ -2321,7 +2321,9 @@ lUInt32 LVCssSelectorRule::getWeight() {
         case cssrt_predecessor:   // E + F
         case cssrt_predsibling:   // E ~ F
             // But not when they don't have an element (_id=0)
-            return _id != 0 ? 1 : 0;
+            // return _id != 0 ? 1 : 0;
+            // But we already added it in LVCssSelector::parse()
+            return 0;
             break;
         case cssrt_universal:     // *
             return 0;

And we have a little issue where we lose the order of CSS selectors.
Pasting some early fix explaining this (to not lose it), that I may rework:

--- a/crengine/src/lvstsheet.cpp
+++ b/crengine/src/lvstsheet.cpp
@@ -3214,19 +3217,39 @@ void LVStyleSheet::apply( const ldomNode * node, css_style_rec_t * style )
     LVCssSelector * selector_0 = _selectors[0];
     LVCssSelector * selector_id = id>0 && id<_selectors.length() ? _selectors[id] : NULL;

     for (;;)
     {
         if (selector_0!=NULL)
         {
-            if (selector_id==NULL || selector_0->getSpecificity() < selector_id->getSpecificity() )
+            // Note that by splitting elements in selector_0 and selector_id (for performance reasons,
+            // segmenting the selectors by their start element ID - A, SPAN, P...), we lose a bit
+            // of their ordering in the stylesheet, and we may apply selectors with the same
+            // specificity in a different order than the one in the stylesheet.
+            // e.g with https://www.brunildo.org/test/IEASpec.html
+            //     <style type="text/css">
+            //       p.c12    { color: red; }   /* 0.0.1.1 specificity, goes into selector_id  */
+            //       div .c14 { color: red; }   /* 0.0.1.1 specificity, goes into selector_0   */
+            //       div .c12 { color: green; } /* 0.0.1.1 specificity, goes into selector_0   */
+            //       p.c14    { color: green; } /* 0.0.1.1 specificity, goes into selector_id  */
+            //     </style>
+            //   <div><p class="c12">el.class (red), el .class (green)  (wrong in IE/Win, IE/Mac and Op7-)</p></div>
+            //   <div><p class="c14">el .class (red), el.class (green)</p></div>
+            // Both text should be green (because green are specified in late selectors in the CSS, but
+            // depending on whether we use '<' or '<=' in the next line, one of them will be red...
+            // Let's use '<=' so "p.c12" gets applied after "div .c12" (they have the same specificity
+            // per specs, but let's have those with an element last have a little more) - so, we'll
+            // be in the situation of "wrong in IE/Win, IE/Mac and Op7").
+            if (selector_id==NULL || selector_0->getSpecificity() <= selector_id->getSpecificity() )
             {
                 // step by sel_0
                 selector_0->apply( node, style );
                 selector_0 = selector_0->getNext();
             }
             else
             {
                 // step by sel_id
                 selector_id->apply( node, style );
                 selector_id = selector_id->getNext();
             }

We are currently storing the specificity in a lUInt32, the 3 numbers being stored each in a byte.
I think we could shorten each of them from 256 to 16 (or 64), put them in the most significant side of the lUint32, and get a few bits/bytes on the least side to store some counter value, so in effect counting the selectors as we process them, and putting its ordering as part of the specificity.

Anything wrong with that, that you could think of ?

@NiLuJe
Copy link
Member

NiLuJe commented Jan 12, 2020

Essentially make the specifity part a bit-field in however space is needed for that? Sounds fine, AFAICT :).

@poire-z
Copy link
Contributor Author

poire-z commented Jan 12, 2020

Yep.
Given the hard time I had verifying these numbers (checking with python bin(1<<23) and counting 0s and 1s...), if you're at ease with bit shifts, can you confirm I got them right (the <<shift vs the "allow for N") in:

// We are storing specificity in a lUint32.
// We also want to store in its lower bits some counter to ensure
// selectors with the same specificity keep the order we've seen
// when parsing them.
// So, apply the real CSS specificity in higher bits, allowing
// for the bellow number of such rules in a single selector
// (we're not checking for overflow thus...)
#define CSS_SPECIFICITY_ID         1<<28 // allow for 8 #id (b in comment below)
#define CSS_SPECIFICITY_ATTR_CLASS 1<<23 // allow for 32 .class and [attr...] (c)
#define CSS_SPECIFICITY_ELEMENT    1<<18 // allow for 32 element names div > p span (d)
// This allows for counting 1<<18 (262144) selectors and storing
// its position in the 18 lower bits  (we're not checking for overflow either).

We'll be doing stuff like _specificity += CSS_SPECIFICITY_ELEMENT when we meet rules of one of these types in a same selector.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 12, 2020

Hmm, I'll admit that I was thinking in terms of an actual bitmask (so, one value per flag, and then OR'ing stuff around), but if I get the full picture (which I very well might not be :D), that doesn't quite work here...

If, on the other hand, you want to chuck 3 different (and possibly arbitrary) values in a single int, you can let the compiler do the bit twiddling for you with an union:

typedef union
{
	uint32_t _x;
	struct
	{
		uint8_t id;
		uint8_t class;
		uint16_t element;
	} specificity;
} CSSSpec;

Then you can just set/get specificity.id, specificity.class, etc.

Unless there's a C++ shenanigan that craps on my parade ;p.


EDIT:

Except, if I followed, you want 4 elements, in which case that leaves only 8 bits for a counter if everything's made an uint8_t...

Perhaps a mix'n match of both approaches, with a bitfield somewhere for some of that stuff in order to have an extra 8 bits available?

@poire-z
Copy link
Contributor Author

poire-z commented Jan 12, 2020

Nope, I don't really need that.
I don't really need access to individual parts. I just need to compute a single lUInt32, let's call it a weight, so that I'll be inserting a selector with such weight in a selectorS list, keeping them ordered by weight in that list. (well, when I write "I", I mean "crengine already does that :).

For a single selector, i'll be adding multiple weight-parts as I'm parsing the CSS.
For example, parsing DIV#toto > P SPAN.tutu .titi, I'll be adding (while parsing):

DIV:   +CSS_SPECIFICITY_ELEMENT
#toto: +CSS_SPECIFICITY_ID
> :    +0
P:     +CSS_SPECIFICITY_ELEMENT
SPAN:  +CSS_SPECIFICITY_ELEMENT
tutu:  +CSS_SPECIFICITY_ATTR_CLASS
titi:  +CSS_SPECIFICITY_ATTR_CLASS

According to the specificity rules, that makes a specificity of (1, 2, 3) (a=1, b=2, c=3).
(1, 0, 0) has more specificity than (0, 7, 9)

Currently, we are storing these 3 numbers in a lUint32, each in a 8bits slot, so allowing in a selector 256 ids + 256 classnames + 256 elements, which is helluvah more than what we might see in classic CSS, even complex ones:
00000000|IDIDIDID|CLSCLSCL|ELELELEL
The above example would give me 0x00010203.

I now want to store in that lUint32 the counted number of the selector, its sequence number in the whole set of selectors I'm parsing, incrementing that number when I have added a selector - which can be more than 256 (what's available in that lUint32).
The idea is that if I see 2 selectors with the same specificity, the earliest one should be applied first. So, I'm just adding that count to the hackyspecificity. If the 1247th selector I'm parsing fas (1,2,3), and the 1492th too, I'll have these hackyspecificity:
(1, 2, 3, 1247)
(1, 2, 3, 1492)
I want to store the 4-tuple in the lUint32, but the 8bit slot left is not wide enough.
So, I'm going with:
IDI|CLSCL|ELELE|countcountcountcoun (3bits, 5bits, 5bits, 19bits)

And I seem to have my bit maths wrong :|

>>> bin((1<<28) + (1<<23) + (1<<18))
'0b10000100001000000000000000000'
'0b00010000,10000100,00000000,00000000'

should it be?:

>>> bin((1<<29) + (1<<24) + (1<<19))
'0b100001000010000000000000000000'
'0b00100001,00001000,00000000,00000000'

(earlier this afternoon, 1<<28... was feeling right :)

Anyway, :bit fields would might be help with readability (and my maths :), allowing to set individual bit segments, still being able to use the lUint32 as a single int for comparisons.

- _specificity += +CSS_SPECIFICITY_ID
+ _specificity.id += 1
- _specificity += +CSS_SPECIFICITY_CLASS_ATTR
+ _specificity.cls_attr += 1

but not strictly necessary, if you just confirm me which of the bin((1<<28) or bin((1<<29) is right :)

@poire-z
Copy link
Contributor Author

poire-z commented Jan 12, 2020

OK, rerechecked again and bin((1<<29) + (1<<24) + (1<<19)) it must be.

#define CSS_SPECIFICITY_ID         1<<29 // allow for 8 #id (b in comment below)
#define CSS_SPECIFICITY_ATTR_CLASS 1<<24 // allow for 32 .class and [attr...] (c)
#define CSS_SPECIFICITY_ELEMENT    1<<19 // allow for 32 element names div > p span (d)

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

32 - 3 = 29 - 5 = 24 - 5 = 19

2^3 = 8
2^5 = 32
2^19 = 524288

Napkin maths checks out ;).

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Also, my bad, I said bitfield (twice) earlier when I meant bitmask :D.

Although, yeah, GCC bitfields might do the trick in an union, too. Never tried it for anything fancier than faking an uint24_t ;).

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Random PoC:

#include <stdio.h>
#include <stdint.h>

typedef union
{
	uint32_t specificity;
	struct
	{
		uint8_t id:3;
		uint8_t class:5;
		uint8_t element:5;
		uint32_t count:19;
	} spec;
} CSSSpec;

int main(void)
{
	CSSSpec test = { 0 };
	printf("sizeof(test): %zu\n", sizeof(test));

	test.spec.id = 1U;
	test.spec.class = 2U;
	test.spec.element = 3U;
	test.spec.count = 1U;

	printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test.spec.id, test.spec.class, test.spec.element, test.spec.count, test.specificity);

	CSSSpec test2 = test;
	printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test2.spec.id, test2.spec.class, test2.spec.element, test2.spec.count, test2.specificity);

	CSSSpec test3 = { .specificity = 0x2311 };
	printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test3.spec.id, test3.spec.class, test3.spec.element, test3.spec.count, test3.specificity);
}
sizeof(test): 4
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311

Seems to work out okay ;).

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

With constants, Clang shouts at you if there's an overflow, but since everything's unsigned, it wraps around:

css.c:21:15: warning: implicit truncation from 'unsigned int' to bit-field changes value from 8 to 0 [-Wbitfield-constant-conversion]
        test.spec.id = 8U;
                     ^ ~~
1 warning generated.

Runtime behavior should be identical (i.e, wraparound).

@poire-z
Copy link
Contributor Author

poire-z commented Jan 13, 2020

id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311

I'd be expecting full: 0x21180001 :)

|001|00010|00011|0000000000000000001|
>>> len("00100010000110000000000000000001")
32
>>> hex(0b00100010000110000000000000000001)
'0x22180001'

(But don't bother, except for fun/science - I'm done with it, having it working by using the #define, and I don't want to change too much of crengine lvstsheet to introduce this type/struct - although it indeed looks nicer :)

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Endianess ;).

#include <stdio.h>
#include <stdint.h>

typedef union
{
	uint32_t specificity;
	struct
	{
		uint8_t id:3;
		uint8_t class:5;
		uint8_t element:5;
		uint32_t count:19;
	} spec;
} CSSSpec;

typedef union
{
	uint32_t specificity;
	struct
	{
		uint32_t count:19;
		uint8_t element:5;
		uint8_t class:5;
		uint8_t id:3;
	} spec;
} CSSSpec2;

int main(void)
{
	CSSSpec test = { 0 };
	printf("sizeof(test): %zu\n", sizeof(test));

	test.spec.id = 1U;
	test.spec.class = 2U;
	test.spec.element = 3U;
	test.spec.count = 1U;

	printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test.spec.id, test.spec.class, test.spec.element, test.spec.count, test.specificity);

	CSSSpec test2 = test;
	printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test2.spec.id, test2.spec.class, test2.spec.element, test2.spec.count, test2.specificity);

	CSSSpec test3 = { .specificity = 0x2311 };
	printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test3.spec.id, test3.spec.class, test3.spec.element, test3.spec.count, test3.specificity);

	CSSSpec2 test4 = { .spec.id = test.spec.id, .spec.class = test.spec.class, .spec.element = test.spec.element, .spec.count = test.spec.count };
	printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test4.spec.id, test4.spec.class, test4.spec.element, test4.spec.count, test4.specificity);
}

sizeof(test): 4
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x22180001

@poire-z
Copy link
Contributor Author

poire-z commented Jan 13, 2020

Endianess ;).

Ok :) so may be an argument to not go with that, as I would worry about the order in the bits stuff themselves - I'd fear getting on some archs:

|100|01000|11000|1000000000000000000|
instead of the expected:
|001|00010|00011|0000000000000000001|

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Don't worry. Unless you'll be feeding full int constants, that's abstracted for you ;). (i.e., everything will be BE or everything will be LE, but everything will be in the "right" place for the current endianness, you don't actually care about the internal layout unless you fiddle with the full int directly from another endianness).

(i.e., a left shift always shifts left, a right shift always shifts right, that's the magic of bit twiddling).

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Generally, you don't actually care about byte-order. That's only a potential issue when you have to deal with a stream of data that may be encoded in another endianness (that generally encompasses dealing with networking stuff at the driver level, or when dealing with file formats, and that's pretty much it?).

@poire-z
Copy link
Contributor Author

poire-z commented Jan 13, 2020

Ok, I get the _specificity.class += 1 would just work.
But you would still need to define the 2 variations of the struct, and #ifdef IS_BIG_ENDIAN one #else the other, right? (if we were to support both endiannesses - I remember I worried about that for some other stuff, and we went with just one that worked just right as-is on my linux and my kobo :).

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Nope, nothing's crossing endianness here, whichever version you choose is up to you. You don't care about the internal layout ;). (The first one just felt more "natural" to me, since, again, don't care about the internal layout ;p).

(FWIW, I'm pretty sure no-one's ever actually run KOReader on a BE machine anyway ^^. The only valid and current example I can think of is IBM's Power9 (i.e., PPC 64), although they technically can run either in BE or LE, they just happen to default to BE for reasons (legacy?)).

@poire-z
Copy link
Contributor Author

poire-z commented Jan 13, 2020

Nope, nothing's crossing endianness here, whichever version you choose is up to you. You don't care about the internal layout ;). (The first one just felt more "natural" to me, since, again, don't care about the internal layout ;p).

But I do care about the resulting lUInt32 :) 0x21180001 is right, 0x00002311 is wrong :)
So, one would have to bits-layout-think (or just test :) that to decide between your 2 structs.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Which then begs the question: why, exactly, do you care about the full int? ;).

(I'd expect for stuff to basically check counter if id && class && elem match or something. I can't quite see why you'd want to rely on the full int, except to serialize/deserialize it somewhere in which case endianness won't matter, as everything's done on the same machine).

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Also, more bonus points for GCC bitfields: ARM has a fun set of dedicated bitfield instructions. I'd imagine the compiler has an easier time being convinced to use those when actual bitfields are used instead of manual bit twiddling ;).

@poire-z
Copy link
Contributor Author

poire-z commented Jan 13, 2020

Which then begs the question: why, exactly, do you care about the full int? ;).

Existing code does :) the whole point is to apply rules by increased specificity order (so, it does that at parsing time, so it does not to do that at each apply time):

// place rules to sheet
for (LVCssSelector * p = selector; p; )
{
LVCssSelector * item = p;
p=p->getNext();
lUInt16 id = item->getElementNameId();
if (_selectors.length()<=id)
_selectors.set(id, NULL);
// insert with specificity sorting
if ( _selectors[id] == NULL
|| _selectors[id]->getSpecificity() > item->getSpecificity() )
{
// insert as first item
item->setNext( _selectors[id] );
_selectors[id] = item;
}
else
{
// insert as internal item
for (LVCssSelector * p = _selectors[id]; p; p = p->getNext() )
{
if ( p->getNext() == NULL
|| p->getNext()->getSpecificity() > item->getSpecificity() )
{
item->setNext( p->getNext() );
p->setNext( item );
break;
}
}
}
}

Or you'd have to add a lt/gt wrapping methods to your struct so it computes some comparion weight internally - but then, the resulting lUInt32 is just, with no cost induced.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

I was afraid it was going to be something like that ;p.

That's admittedly already broken on BE, though, isn't it?

@poire-z
Copy link
Contributor Author

poire-z commented Jan 13, 2020

Not as long as we're using a simple lUInt32, to which we just add numbers (I add the number that 1<<29 gives, I'm not shifting anything) - and I just compare these numbers.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Hmm, if we're reading an uint, and not raw memory (i.e., a pointer to the first bytes of that uint), endianess doesn't actually matter?

I think?

My brain hurts >_<".

There's also the fact that C++ forbids type-punning, so I'm not sure my C union shenanigans are actually doable in C++...

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

No, that works in C++, yay.

(Mostly, class is a reserved keyword, duh. And C++ doesn't do designated aggregate list init o_O).

#include <iostream>
#include <cstdint>

typedef union
{
        uint32_t specificity;
        struct
        {
                uint8_t id:3;
                uint8_t cl:5;
                uint8_t element:5;
                uint32_t count:19;
        } spec;
} CSSSpec;

typedef union
{
        uint32_t specificity;
        struct
        {
                uint32_t count:19;
                uint8_t element:5;
                uint8_t cl:5;
                uint8_t id:3;
        } spec;
} CSSSpec2;

int main(void)
{
        CSSSpec test = { 0 };
        printf("sizeof(test): %zu\n", sizeof(test));

        test.spec.id = 1U;
        test.spec.cl = 2U;
        test.spec.element = 3U;
        test.spec.count = 1U;

        printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test.spec.id, test.spec.cl, test.spec.element, test.spec.count, test.specificity);

        CSSSpec test2 = test;
        printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test2.spec.id, test2.spec.cl, test2.spec.element, test2.spec.count, test2.specificity);

        CSSSpec test3 = { .specificity = 0x2311 };
        printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test3.spec.id, test3.spec.cl, test3.spec.element, test3.spec.count, test3.specificity);

        CSSSpec2 test4 = { 0 };
        test4.spec.id = test.spec.id;
        test4.spec.cl = test.spec.cl;
        test4.spec.element = test.spec.element;
        test4.spec.count = test.spec.count;
        printf("id: %hhu // class: %hhu // element: %hhu // count: %u // full: %#.8x\n", test4.spec.id, test4.spec.cl, test4.spec.element, test4.spec.count, test4.specificity);

        // Mind twist!
        char *pointer = (char *) &test.specificity;
        printf("full (in memory, byte by byte): 0x");
        for (int32_t i = 0; i < 4; i++)
        {
                printf("%02x", (uint32_t) pointer[i]);
        }
        printf("\n");
}

With the LE mind-twist:

sizeof(test): 4
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x00002311
id: 1 // class: 2 // element: 3 // count: 1 // full: 0x22180001
full (in memory, byte by byte): 0x11230000

@poire-z
Copy link
Contributor Author

poire-z commented Jan 13, 2020

Hmm, if we're reading an uint, and not raw memory (i.e., a pointer to the first bytes of that uint), endianess doesn't actually matter?

But we use/access never a subset of that lUInt32. We always use it as full lUInt32 (a 4-bytes single integer).
Me wanting to use it as 4 slots of bits might have confused you, sorry :) It is just some abstraction for me incrementing it.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 13, 2020

Then I'm pretty sure endianness is either irrelevant or already broken ;). My vote would be irrelevant :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants