Skip to content

Commit

Permalink
langchain[minor]: Experimental Masking Module (#3548)
Browse files Browse the repository at this point in the history
* [Feature] Implementation of experimental masking parser/transformer

* test: add perf unit test

* fix: rename piitransformer to regextransformer

* added example Kitchen Sink for masking parser

* docs: Add documentation, nextjs example and kitchen sink example

* fix: wording

* docs: add basic example

* fix: remove comment and return stream

* feat: async hooks, immutable parser state

* fix: parse -> mask

* fix: || -> ??

* Fix lint, style

* Fix build

* Update mask.mdx

---------

Co-authored-by: Dzmitry Dubarau <[email protected]>
Co-authored-by: Dzmitry A Dubarau <[email protected]>
Co-authored-by: jacoblee93 <[email protected]>
  • Loading branch information
4 people authored Dec 7, 2023
1 parent 1a68f06 commit 078c5e6
Show file tree
Hide file tree
Showing 15 changed files with 1,220 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docs/core_docs/docs/modules/experimental/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
sidebar_position: 6
---

# Experimental
34 changes: 34 additions & 0 deletions docs/core_docs/docs/modules/experimental/mask/mask.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Masking

The experimental masking parser and transformer is an extendable module for masking and rehydrating strings. One of the primary use cases for this module is to redact PII (Personal Identifiable Information) from a string before making a call to an llm.

### Real world scenario

A customer support system receives messages containing sensitive customer information. The system must parse these messages, mask any PII (like names, email addresses, and phone numbers), and log them for analysis while complying with privacy regulations. Before logging the transcript a summary is generated using an llm.

## Get started

import CodeBlock from "@theme/CodeBlock";
import ExampleBasic from "@examples/experimental/masking/basic.ts";
import ExampleNext from "@examples/experimental/masking/next.ts";
import ExampleKitchenSink from "@examples/experimental/masking/kitchen_sink.ts";

### Basic Example

Use the RegexMaskingTransformer to create a simple mask for email and phone.

<CodeBlock language="typescript">{ExampleBasic}</CodeBlock>

:::note
If you plan on storing the masking state to rehydrate the original values asynchronously ensure you are following best security practices. In most cases you will want to define a custom hashing and salting strategy.
:::

### Next.js stream

Example nextjs chat endpoint leveraging the RegexMaskingTransformer. The current chat message and chat message history are masked every time the api is called with a chat payload.

<CodeBlock language="typescript">{ExampleNext}</CodeBlock>

### Kitchen sink

<CodeBlock language="typescript">{ExampleKitchenSink}</CodeBlock>
4 changes: 4 additions & 0 deletions docs/core_docs/docs/modules/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,7 @@ Persist application state between runs of a chain
#### [Callbacks](/docs/modules/callbacks/)

Log and stream intermediate steps of any chain

#### [Experimental](/docs/modules/experimental/)

Experimental modules whose abstractions have not fully settled
30 changes: 30 additions & 0 deletions examples/src/experimental/masking/basic.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import {
MaskingParser,
RegexMaskingTransformer,
} from "langchain/experimental/masking";

// Define masking strategy
const emailMask = () => `[email-${Math.random().toString(16).slice(2)}]`;
const phoneMask = () => `[phone-${Math.random().toString(16).slice(2)}]`;

// Configure pii transformer
const piiMaskingTransformer = new RegexMaskingTransformer({
email: { regex: /\S+@\S+\.\S+/g, mask: emailMask },
phone: { regex: /\d{3}-\d{3}-\d{4}/g, mask: phoneMask },
});

const maskingParser = new MaskingParser({
transformers: [piiMaskingTransformer],
});
maskingParser.addTransformer(piiMaskingTransformer);

const input =
"Contact me at [email protected] or 555-123-4567. Also reach me at [email protected]";
const masked = await maskingParser.parse(input);

console.log(masked);
// Contact me at [email-a31e486e324f6] or [phone-da8fc1584f224]. Also reach me at [email-d5b6237633d95]

const rehydrated = maskingParser.rehydrate(masked);
console.log(rehydrated);
// Contact me at [email protected] or 555-123-4567. Also reach me at [email protected]
80 changes: 80 additions & 0 deletions examples/src/experimental/masking/kitchen_sink.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import {
MaskingParser,
RegexMaskingTransformer,
} from "langchain/experimental/masking";

// A simple hash function for demonstration purposes
function simpleHash(input: string): string {
let hash = 0;
for (let i = 0; i < input.length; i += 1) {
const char = input.charCodeAt(i);
hash = (hash << 5) - hash + char;
hash |= 0; // Convert to 32bit integer
}
return hash.toString(16);
}

const emailMask = (match: string) => `[email-${simpleHash(match)}]`;
const phoneMask = (match: string) => `[phone-${simpleHash(match)}]`;
const nameMask = (match: string) => `[name-${simpleHash(match)}]`;
const ssnMask = (match: string) => `[ssn-${simpleHash(match)}]`;
const creditCardMask = (match: string) => `[creditcard-${simpleHash(match)}]`;
const passportMask = (match: string) => `[passport-${simpleHash(match)}]`;
const licenseMask = (match: string) => `[license-${simpleHash(match)}]`;
const addressMask = (match: string) => `[address-${simpleHash(match)}]`;
const dobMask = (match: string) => `[dob-${simpleHash(match)}]`;
const bankAccountMask = (match: string) => `[bankaccount-${simpleHash(match)}]`;

// Regular expressions for different types of PII
const patterns = {
email: { regex: /\S+@\S+\.\S+/g, mask: emailMask },
phone: { regex: /\b\d{3}-\d{3}-\d{4}\b/g, mask: phoneMask },
name: { regex: /\b[A-Z][a-z]+ [A-Z][a-z]+\b/g, mask: nameMask },
ssn: { regex: /\b\d{3}-\d{2}-\d{4}\b/g, mask: ssnMask },
creditCard: { regex: /\b(?:\d{4}[ -]?){3}\d{4}\b/g, mask: creditCardMask },
passport: { regex: /(?i)\b[A-Z]{1,2}\d{6,9}\b/g, mask: passportMask },
license: { regex: /(?i)\b[A-Z]{1,2}\d{6,8}\b/g, mask: licenseMask },
address: {
regex: /\b\d{1,5}\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)\*\b/g,
mask: addressMask,
},
dob: { regex: /\b\d{4}-\d{2}-\d{2}\b/g, mask: dobMask },
bankAccount: { regex: /\b\d{8,17}\b/g, mask: bankAccountMask },
};

// Create a RegexMaskingTransformer with multiple patterns
const piiMaskingTransformer = new RegexMaskingTransformer(patterns);

// Hooks for different stages of masking and rehydrating
const onMaskingStart = (message: string) =>
console.log(`Starting to mask message: ${message}`);
const onMaskingEnd = (maskedMessage: string) =>
console.log(`Masked message: ${maskedMessage}`);
const onRehydratingStart = (message: string) =>
console.log(`Starting to rehydrate message: ${message}`);
const onRehydratingEnd = (rehydratedMessage: string) =>
console.log(`Rehydrated message: ${rehydratedMessage}`);

// Initialize MaskingParser with the transformer and hooks
const maskingParser = new MaskingParser({
transformers: [piiMaskingTransformer],
onMaskingStart,
onMaskingEnd,
onRehydratingStart,
onRehydratingEnd,
});

// Example message containing multiple types of PII
const message =
"Contact Jane Doe at [email protected] or 555-123-4567. Her SSN is 123-45-6789 and her credit card number is 1234-5678-9012-3456. Passport number: AB1234567, Driver's License: X1234567, Address: 123 Main St, Date of Birth: 1990-01-01, Bank Account: 12345678901234567.";

// Mask and rehydrate the message
maskingParser
.parse(message)
.then((maskedMessage: string) => {
console.log(`Masked message: ${maskedMessage}`);
return maskingParser.rehydrate(maskedMessage);
})
.then((rehydratedMessage: string) => {
console.log(`Final rehydrated message: ${rehydratedMessage}`);
});
69 changes: 69 additions & 0 deletions examples/src/experimental/masking/next.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
// app/api/chat

import {
MaskingParser,
RegexMaskingTransformer,
} from "langchain/experimental/masking";
import { PromptTemplate } from "langchain/prompts";
import { ChatOpenAI } from "langchain/chat_models/openai";
import { BytesOutputParser } from "langchain/schema/output_parser";

export const runtime = "edge";

// Function to format chat messages for consistency
const formatMessage = (message: any) => `${message.role}: ${message.content}`;

const CUSTOMER_SUPPORT = `You are a customer support summarizer agent. Always include masked PII in your response.
Current conversation:
{chat_history}
User: {input}
AI:`;

// Configure Masking Parser
const maskingParser = new MaskingParser();
// Define transformations for masking emails and phone numbers using regular expressions
const piiMaskingTransformer = new RegexMaskingTransformer({
email: { regex: /\S+@\S+\.\S+/g }, // If a regex is provided without a mask we fallback to a simple default hashing function
phone: { regex: /\d{3}-\d{3}-\d{4}/g },
});

maskingParser.addTransformer(piiMaskingTransformer);

export async function POST(req: Request) {
try {
const body = await req.json();
const messages = body.messages ?? [];
const formattedPreviousMessages = messages.slice(0, -1).map(formatMessage);
const currentMessageContent = messages[messages.length - 1].content; // Extract the content of the last message
// Mask sensitive information in the current message
const guardedMessageContent = await maskingParser.parse(
currentMessageContent
);
// Mask sensitive information in the chat history
const guardedHistory = await maskingParser.parse(
formattedPreviousMessages.join("\n")
);

const prompt = PromptTemplate.fromTemplate(CUSTOMER_SUPPORT);
const model = new ChatOpenAI({ temperature: 0.8 });
// Initialize an output parser that handles serialization and byte-encoding for streaming
const outputParser = new BytesOutputParser();
const chain = prompt.pipe(model).pipe(outputParser); // Chain the prompt, model, and output parser together

console.log("[GUARDED INPUT]", guardedMessageContent); // Contact me at -1157967895 or -1626926859.
console.log("[GUARDED HISTORY]", guardedHistory); // user: Contact me at -1157967895 or -1626926859. assistant: Thank you for providing your contact information.
console.log("[STATE]", maskingParser.getState()); // { '-1157967895' => '[email protected]', '-1626926859' => '555-123-4567'}

// Stream the AI response based on the masked chat history and current message
const stream = await chain.stream({
chat_history: guardedHistory,
input: guardedMessageContent,
});

return new Response(stream, {
headers: { "content-type": "text/plain; charset=utf-8" },
});
} catch (e: any) {
return Response.json({ error: e.message }, { status: 500 });
}
}
1 change: 1 addition & 0 deletions langchain/scripts/create-entrypoints.js
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,7 @@ const entrypoints = {
"experimental/hubs/makersuite/googlemakersuitehub",
"experimental/chains/violation_of_expectations":
"experimental/chains/violation_of_expectations/index",
"experimental/masking": "experimental/masking/index",
"experimental/tools/pyinterpreter": "experimental/tools/pyinterpreter",
// evaluation
evaluation: "evaluation/index",
Expand Down
8 changes: 8 additions & 0 deletions langchain/src/experimental/masking/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
export { MaskingParser } from "./parser.js";
export { RegexMaskingTransformer } from "./regex_masking_transformer.js";
export { MaskingTransformer } from "./transformer.js";
export {
type MaskingParserConfig,
type HashFunction,
type HookFunction,
} from "./types.js";
Loading

2 comments on commit 078c5e6

@vercel
Copy link

@vercel vercel bot commented on 078c5e6 Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vercel
Copy link

@vercel vercel bot commented on 078c5e6 Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

langchainjs-docs – ./docs/core_docs/

langchainjs-docs-langchain.vercel.app
langchainjs-docs-git-main-langchain.vercel.app
langchainjs-docs-ruddy.vercel.app
js.langchain.com

Please sign in to comment.