Skip to content

Commit

Permalink
Data below a threshold should not be stored in DataStore. (#456)
Browse files Browse the repository at this point in the history
* add encodedData field to RecordsWrite in message storage under a threshold

In this commit I've added utilized the existing RecordsWriteMessageWithOptionalEncodedData class
to leverage writing data from RecordsWrite messages directly into the MessageStore when the data
is below the 50k byte threshold.

For now I'm still storing in DataStore, but that will be removed in subsequent commits.

I had to take care and remove `encodedData` from messages when computing the CID and made sure to use the
Message.getCid() helper for computing CIDs for message storage.

Additionally, I've made sure to prune the original RecordsWrite of any encodedData when cleanup happens.

* do not store data smaller tahn a threshold in the DataStore

In this commit I've built on the previous one by not storing data larger than the treshold
within the DataStore and keepign it only embedded within the MessageStore.

I've imicked the sareguards and errors from putData into processEncodedData taking into account encodedData.
I've also implemented tests for various data sizes in certain places where I noticed it may matter.

I'm missing some additional tests that I will write in subsequent commits.

* added tests for MessagesGet and for Message getCid

* fixed broken tests that weren't running, added tests to check paths between putData and processEncodedData

* fix tests that weren't running

* more coverage for records-read

* be explicit when testing for thresholds

* refactor validation of data

* remove backward compatibile code, update tests

* remove unneeded tests for backward compatibilty

* remove storage controller query method, replace it with messageStore querymethod

* remove unneeded comment about storage controller from

* comments and variables for clarity

* comments for further clarity around encodedData

* readme update after rebase
  • Loading branch information
LiranCohen authored Aug 15, 2023
1 parent 03066e5 commit b2927e6
Show file tree
Hide file tree
Showing 15 changed files with 640 additions and 155 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Decentralized Web Node (DWN) SDK <!-- omit in toc -->

Code Coverage
![Statements](https://img.shields.io/badge/statements-97.51%25-brightgreen.svg?style=flat) ![Branches](https://img.shields.io/badge/branches-94.35%25-brightgreen.svg?style=flat) ![Functions](https://img.shields.io/badge/functions-93.86%25-brightgreen.svg?style=flat) ![Lines](https://img.shields.io/badge/lines-97.51%25-brightgreen.svg?style=flat)
![Statements](https://img.shields.io/badge/statements-97.53%25-brightgreen.svg?style=flat) ![Branches](https://img.shields.io/badge/branches-94.42%25-brightgreen.svg?style=flat) ![Functions](https://img.shields.io/badge/functions-93.88%25-brightgreen.svg?style=flat) ![Lines](https://img.shields.io/badge/lines-97.53%25-brightgreen.svg?style=flat)


- [Introduction](#introduction)
Expand Down
3 changes: 2 additions & 1 deletion src/core/dwn-constant.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
export class DwnConstant {
/**
* The maximum size in bytes of raw data that will be returned as `encodedData`.
* this is also the maximum size that we will store within MessageStore.
*/
public static readonly maxDataSizeAllowedToBeEncoded = 10_000;
public static readonly maxDataSizeAllowedToBeEncoded = 50_000;
}
3 changes: 2 additions & 1 deletion src/core/dwn-error.ts
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ export enum DwnErrorCode {
RecordsWriteDataCidMismatch = 'RecordsWriteDataCidMismatch',
RecordsWriteDataSizeMismatch = 'RecordsWriteDataSizeMismatch',
RecordsWriteMissingAuthorizationSignatureInput = 'RecordsWriteMissingAuthorizationSignatureInput',
RecordsWriteMissingData = 'RecordsWriterMissingData',
RecordsWriteMissingDataInPrevious = 'RecordsWriteMissingDataInPrevious',
RecordsWriteMissingDataAssociation = 'RecordsWriteMissingDataAssociation',
RecordsWriteMissingDataStream = 'RecordsWriteMissingDataStream',
RecordsWriteMissingProtocol = 'RecordsWriteMissingProtocol',
RecordsWriteMissingSchema = 'RecordsWriteMissingSchema',
Expand Down
9 changes: 8 additions & 1 deletion src/core/message.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,14 @@ export abstract class Message<M extends GenericMessage> {
// the message will contain properties that should not be part of the CID computation
// and we need to strip them out (like `encodedData` that we historically had for a long time),
// but we can remove this method entirely if the code becomes stable and it is apparent that the wrapper is not needed
const cid = await Cid.computeCid(message);

// ^--- seems like we might need to keep this around for now.
const rawMessage = { ...message } as any;
if (rawMessage.encodedData) {
delete rawMessage.encodedData;
}

const cid = await Cid.computeCid(rawMessage as GenericMessage);
return cid;
}

Expand Down
27 changes: 8 additions & 19 deletions src/handlers/messages-get.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,13 @@ import type { DataStore } from '../types/data-store.js';
import type { DidResolver } from '../did/did-resolver.js';
import type { MessageStore } from '../types/message-store.js';
import type { MethodHandler } from '../types/method-handler.js';
import type { RecordsWriteMessage } from '../types/records-types.js';
import type { RecordsWriteMessageWithOptionalEncodedData } from '../store/storage-controller.js';
import type { MessagesGetMessage, MessagesGetReply, MessagesGetReplyEntry } from '../types/messages-types.js';

import { DataStream } from '../utils/data-stream.js';
import { DwnConstant } from '../core/dwn-constant.js';
import { Encoder } from '../utils/encoder.js';
import { messageReplyFromError } from '../core/message-reply.js';
import { MessagesGet } from '../interfaces/messages-get.js';
import { authenticate, authorize } from '../core/auth.js';
import { DwnInterfaceName, DwnMethodName, Message } from '../core/message.js';
import { DwnInterfaceName, DwnMethodName } from '../core/message.js';

type HandleArgs = { tenant: string, message: MessagesGetMessage };

Expand Down Expand Up @@ -54,7 +51,6 @@ export class MessagesGetHandler implements MethodHandler {
// for every message, include associated data as `encodedData` IF:
// * its a RecordsWrite
// * the data size is equal or smaller than the size threshold
//! NOTE: this is somewhat duplicate code that also exists in `StorageController.query`.
for (const entry of messages) {
const { message } = entry;

Expand All @@ -67,19 +63,12 @@ export class MessagesGetHandler implements MethodHandler {
continue;
}

// RecordsWrite specific handling
const recordsWrite = message as RecordsWriteMessage;
const dataCid = recordsWrite.descriptor.dataCid;
const dataSize = recordsWrite.descriptor.dataSize;

if (dataCid !== undefined && dataSize! <= DwnConstant.maxDataSizeAllowedToBeEncoded) {
const messageCid = await Message.getCid(message);
const result = await this.dataStore.get(tenant, messageCid, dataCid);

if (result) {
const dataBytes = await DataStream.toBytes(result.dataStream);
entry.encodedData = Encoder.bytesToBase64Url(dataBytes);
}
// RecordsWrite specific handling, if MessageStore has embedded `encodedData` return it with the entry.
// we store `encodedData` along with the message if the data is below a certain threshold.
const recordsWrite = message as RecordsWriteMessageWithOptionalEncodedData;
if (recordsWrite.encodedData !== undefined) {
entry.encodedData = recordsWrite.encodedData;
delete recordsWrite.encodedData;
}
}

Expand Down
11 changes: 5 additions & 6 deletions src/handlers/records-query.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import type { RecordsQueryMessage, RecordsQueryReply, RecordsQueryReplyEntry, Re
import { authenticate } from '../core/auth.js';
import { lexicographicalCompare } from '../utils/string.js';
import { messageReplyFromError } from '../core/message-reply.js';
import { StorageController } from '../store/storage-controller.js';

import { DateSort, RecordsQuery } from '../interfaces/records-query.js';
import { DwnInterfaceName, DwnMethodName } from '../core/message.js';
Expand Down Expand Up @@ -86,7 +85,8 @@ export class RecordsQueryHandler implements MethodHandler {
method : DwnMethodName.Write,
isLatestBaseState : true
};
const records = await StorageController.query(this.messageStore, this.dataStore, tenant, filter);

const records = (await this.messageStore.query(tenant, filter)) as RecordsWriteMessageWithOptionalEncodedData[];
return records;
}

Expand Down Expand Up @@ -133,7 +133,7 @@ export class RecordsQueryHandler implements MethodHandler {
published : true,
isLatestBaseState : true
};
const publishedRecords = await StorageController.query(this.messageStore, this.dataStore, tenant, filter);
const publishedRecords = (await this.messageStore.query(tenant, filter)) as RecordsWriteMessageWithOptionalEncodedData[];
return publishedRecords;
}

Expand All @@ -152,8 +152,7 @@ export class RecordsQueryHandler implements MethodHandler {
isLatestBaseState : true,
published : false
};
const unpublishedRecordsForQueryAuthor = await StorageController.query(this.messageStore, this.dataStore, tenant, filter);

const unpublishedRecordsForQueryAuthor = (await this.messageStore.query(tenant, filter)) as RecordsWriteMessageWithOptionalEncodedData[];
return unpublishedRecordsForQueryAuthor;
}

Expand All @@ -172,7 +171,7 @@ export class RecordsQueryHandler implements MethodHandler {
isLatestBaseState : true,
published : false
};
const unpublishedRecordsForQueryAuthor = await StorageController.query(this.messageStore, this.dataStore, tenant, filter);
const unpublishedRecordsForQueryAuthor = (await this.messageStore.query(tenant, filter)) as RecordsWriteMessageWithOptionalEncodedData[];
return unpublishedRecordsForQueryAuthor;
}
}
Expand Down
29 changes: 19 additions & 10 deletions src/handlers/records-read.ts
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
import type { MethodHandler } from '../types/method-handler.js';
import type { RecordsWriteMessageWithOptionalEncodedData } from '../store/storage-controller.js';
import type { TimestampedMessage } from '../types/message-types.js';
import type { DataStore, DidResolver, MessageStore } from '../index.js';
import type { RecordsReadMessage, RecordsReadReply, RecordsWriteMessage } from '../types/records-types.js';
import type { RecordsReadMessage, RecordsReadReply } from '../types/records-types.js';

import { authenticate } from '../core/auth.js';
import { Message } from '../core/message.js';
import { messageReplyFromError } from '../core/message-reply.js';
import { RecordsRead } from '../interfaces/records-read.js';
import { RecordsWrite } from '../interfaces/records-write.js';
import { DataStream, Encoder } from '../index.js';
import { DwnInterfaceName, DwnMethodName } from '../core/message.js';

export class RecordsReadHandler implements MethodHandler {
Expand Down Expand Up @@ -51,28 +53,35 @@ export class RecordsReadHandler implements MethodHandler {
};
}

const newestRecordsWrite = newestExistingMessage as RecordsWriteMessage;
const newestRecordsWrite = newestExistingMessage as RecordsWriteMessageWithOptionalEncodedData;
try {
await recordsRead.authorize(tenant, await RecordsWrite.parse(newestRecordsWrite), this.messageStore);
} catch (error) {
return messageReplyFromError(error, 401);
}

const messageCid = await Message.getCid(newestRecordsWrite);
const result = await this.dataStore.get(tenant, messageCid, newestRecordsWrite.descriptor.dataCid);

if (result?.dataStream === undefined) {
return {
status: { code: 404, detail: 'Not Found' }
};
let data;
if (newestRecordsWrite.encodedData !== undefined) {
const dataBytes = Encoder.base64UrlToBytes(newestRecordsWrite.encodedData);
data = DataStream.fromBytes(dataBytes);
delete newestRecordsWrite.encodedData;
} else {
const messageCid = await Message.getCid(newestRecordsWrite);
const result = await this.dataStore.get(tenant, messageCid, newestRecordsWrite.descriptor.dataCid);
if (result?.dataStream === undefined) {
return {
status: { code: 404, detail: 'Not Found' }
};
}
data = result.dataStream;
}

const { authorization: _, ...recordsWriteWithoutAuthorization } = newestRecordsWrite; // a trick to stripping away `authorization`
const messageReply: RecordsReadReply ={
status : { code: 200, detail: 'OK' },
record : {
...recordsWriteWithoutAuthorization,
data: result.dataStream
data,
}
};
return messageReply;
Expand Down
Loading

0 comments on commit b2927e6

Please sign in to comment.