Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to serialize byteSize() #58

Open
fkgruber opened this issue Mar 24, 2019 · 5 comments
Open

How to serialize byteSize() #58

fkgruber opened this issue Mar 24, 2019 · 5 comments

Comments

@fkgruber
Copy link

Hi
I'm trying to write multiple protobuf messages to a single file by including the byteSize() before serializing the message.

How do I properly write the byteSize to the file? In Python example I see that the byteSize is first converted to varint using the function _VarintBytes(). How do I convert the size in R?

thanks
FKG

@eddelbuettel
Copy link
Owner

I am not sure I understand the question. ByteSize() appears to be a function of the C(++) API of ProtcolBuffers returning a const int..

As it returns an int you would presumably serialize that int the way you usually do. Or maybe I misunderstand what you're after in which case you could try to explain it again.

@fkgruber
Copy link
Author

fkgruber commented Mar 24, 2019

What I'm trying to do is very simple. Suppose I have 2 messages:

library("RProtoBuf")
p = new(tutorial.Person, id = 1, name = 'Dirk')
p$name = 'Murray'
p2 = new(Rtutorial.Person)
p2$id = 1
p2$name = "test"

I would like to write both messages to a protobuf file and then read them back in. My understanding is that you need to first write the length of the message and then the message. Then when you read it in you read the length and then the message.

Here is an example of this idea in python:
https://www.datadoghq.com/blog/engineering/protobuf-parsing-in-python/

Based on that example I came up with the following script that works in limited cases:

library(reticulate)
goog = import("google.protobuf.internal.decoder")
goog2 = import("google.protobuf.internal.encoder")
varintbytes = goog2$`_VarintBytes`
decode32 = goog$`_DecodeVarint32`

## write protobuf
tf <- "test_bytes.bin"
con <- file(tf, open = "wb")
p1size = p1$bytesize()
writeBin(charToRaw(intToUtf8(p1size)), con)
p1$serialize(con)
p2size = p2$bytesize()
writeBin(charToRaw(intToUtf8(p2size)), con)
p2$serialize(con)
close(con)

##read protobuf
tfr = readBin(tf, "raw", file.size(tf))
## read 1st message
n = 1
pos = decode32(tfr,as.integer(n - 1))
clen = pos[[1]]
n = pos[[2]] + 1
nend = n + clen - 1
##nend = n + clen
pdata = tfr[n:nend]
p1r = tutorial.Person$read(tfr[n:nend])
writeLines(as.character(p1r))
writeLines(as.character(p1))
## read 2nd message
n = clen + n
pos = decode32(tfr, as.integer(n - 1))
clen = pos[[1]]
n = pos[[2]] + 1
nend = n + clen - 1
pdata = tfr[n:nend]
p2r = tutorial.Person$read(tfr[n:nend])
writeLines(as.character(p2r))
writeLines(as.character(p2))

I'm replacing the _VarintBytes from python by charToRaw(intToUtf8()). This seems to work as long the bytesize is less than 128 because starting at 128 it requires more symbols:

charToRaw(intToUtf8(127))
[1] 7f
> charToRaw(intToUtf8(128))
[1] c2 80
> 

and it breaks.

For example:
if I change p2's name to:
p2$name = paste0("longname", rep("tesdf", 100), collapse = "_")
it no longer works.

@eddelbuettel
Copy link
Owner

I would look at the unit test files runit.serialize_pb.R and runit.serialize.R.

@eddelbuettel
Copy link
Owner

There are also the *Stream classes but I am not sure we have an example that serialize to file or connection based on a proto definition -- parts where always lacking because Google did not offer RPC support untol gRPC.io. You may need to cook something up based on the raw vectors.

@fkgruber
Copy link
Author

I see that RProtobuf has a function RProtoBuf::WriteVarint32. Are there any examples on how to use it?

I think that is the equivalent in cpp to that python function _VarintBytes in the example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants