You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.
json library won't work directly on the file - no json.load(...) .
The following code won't work:
import json
with open('output.json', 'r') as f_in:
data = json.load(f_in)
#this will throw a MemoryError
The code I normally used is the following, with this one specifically created to get a random sample:
import os, sys
import json
import random
directory = "/bigdumpdata"
datred = defaultdict(int)
counter = -1
sample = {}
with open(directory+"/output.json", "r") as f_in:
while 1:
record = f_in.readline()
counter += 1
if not record:
break
if len(record) > 3:
try:
if random.uniform(0,1) <= .01:
recordjson = json.loads(record[:-2])
rec = sorted([(rec["completedDate"],rec["name"]) for rec in recordjson if "name" in list(rec.keys()) and "completedDate" in list(rec.keys())])
if rec == []:
continue
recordjson = json.loads(record[:-2])
sample[counter] = recordjson
except ValueError:
if record == '':
continue
with open(directory+'/outputsample.json','w') as f_out:
json.dump(sample, f_out)
R:
rjson takes long time to load before throwing an error after the file is converted from 3.7GB into a 5GB one:
library('rjson')
json_data <- fromJSON(file='output.json')
#Error in paste(readLines(file, warn = FALSE), collapse = "") :
#result would exceed 2^31-1 bytes
jsonlite takes long time to load (about 10min) but open after the file is converted from 3.7GB into a one of 5GB:
library(jsonlite)
json_data <- fromJSON("1_archive/output.json", flatten=TRUE)
#result in a R's list data type
The text was updated successfully, but these errors were encountered:
Python:
json.load(...)
.The following code won't work:
The code I normally used is the following, with this one specifically created to get a random sample:
R:
rjson
takes long time to load before throwing an error after the file is converted from 3.7GB into a 5GB one:jsonlite
takes long time to load (about 10min) but open after the file is converted from 3.7GB into a one of 5GB:The text was updated successfully, but these errors were encountered: