Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to effectively store binary data? #898

Closed
rexdf opened this issue Jan 2, 2018 · 4 comments
Closed

How to effectively store binary data? #898

rexdf opened this issue Jan 2, 2018 · 4 comments

Comments

@rexdf
Copy link

rexdf commented Jan 2, 2018

I have a data , uint8_t my_data[26] which I want to put in json object once and fetch it quite frequently. There is no need to dump the json object.

I try to convert it to std::string and it does not work(rapidjson seems ok). Casting to and from std::vector seems quite expensive. I also need the lifecycle of the my_data is same as the json object. Is there a better way to solve it?

I thought that maybe I could get the std::string::data() or std::vector::data() directly from json object but I could not find the right way.

@gregmarr
Copy link
Contributor

gregmarr commented Jan 2, 2018

The data in a std::string in json needs to be a valid UTF-8 string, which an array of arbitrary bytes has a good potential to not be. I'm not sure that json is the right place for it if you need to access it regularly as uint8_t my_data[26], or the more modern std::array<uint8_t, 26>. Could you just put the json and my_data in a struct, so their lifetimes are the same but you're not trying to put the arbitrary bytes into the json object?

@rexdf
Copy link
Author

rexdf commented Jan 3, 2018

Well, I just use json as the basic data structures of other languages. The json object is quite large and deep and there are many my_datas in it. Is std::array internal used as basic data type in nlohmann/json ? What if it was! Maybe I should just try to put a pointer from malloc in json object.

@gregmarr
Copy link
Contributor

gregmarr commented Jan 3, 2018

No, it uses std::vector of json. The std::array class is fixed size, so it would be inappropriate for use inside json.

You should not be trying to put arbitrary pointers into the json object. That way lies madness.

I'm not sure how one would map an arbitrary array of bytes into json such that it can be retrieved quite frequently without the expense of extracting from std::vector each time. You could, I suppose, base 64 encode it, so it's then a string, but you do still need to decode it.

@rexdf
Copy link
Author

rexdf commented Jan 3, 2018

// ConsoleApplication4.cpp : Defines the entry point for the console application.
//

//#include "stdafx.h"
#include <nlohmann/json.hpp>
#include <iostream>
#include <chrono>

using namespace std;
using json = nlohmann::json;

int main()
{
	uint8_t my_data[26]={1,2,3,4,5};
	json j = { {"test1", my_data},{"test2",string(26,'a')} };
	cout << j.dump() << endl;
	const int test_times = 50000; //
	{
		auto begin = std::chrono::high_resolution_clock::now();
		string s;
		for (int i = 0; i < test_times; i++) {
			memset(my_data, 0, sizeof(my_data));
			s = j["test2"].get<string>();
		}
		auto end = std::chrono::high_resolution_clock::now();
		cout << s << " ";
		cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << endl;
	}

	{
		auto begin = std::chrono::high_resolution_clock::now();
		vector<uint8_t> a;
		for (int i = 0; i < test_times; i++) {
			a = j["test1"].get<vector<uint8_t>>();
		}
		auto end = std::chrono::high_resolution_clock::now();
		cout << int(a[4]) << " ";
		cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << endl;
	}

	{
		auto begin = std::chrono::high_resolution_clock::now();
		json::array_t* p;
		for (int i = 0; i < test_times; i++) {
			p = j["test1"].get_ptr<json::array_t*>();
		}
		auto end = std::chrono::high_resolution_clock::now();
		cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << endl;
	}
	//system("pause");
	return 0;
}

The result:

{"test1":[1,2,3,4,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"test2":"aaaaaaaaaaaaaaaaaaaaaaaaaa"}
aaaaaaaaaaaaaaaaaaaaaaaaaa 1801
5 17446
658

On linux, I get this:

{"test1":[1,2,3,4,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"test2":"aaaaaaaaaaaaaaaaaaaaaaaaaa"}
aaaaaaaaaaaaaaaaaaaaaaaaaa 39
5 536
11

It's still about 3:10:1

@rexdf rexdf closed this as completed Jan 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants