-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
langchain[minor]: Experimental Masking Module #3548
Merged
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
5301fc4
[Feature] Implementation of experimental masking parser/transformer
Jordan-Gilliam 6997222
test: add perf unit test
Jordan-Gilliam aeb37d7
fix: rename piitransformer to regextransformer
Jordan-Gilliam d85d15d
added example Kitchen Sink for masking parser
bc2337a
merged with main
af49c2b
docs: Add documentation, nextjs example and kitchen sink example
Jordan-Gilliam 0bfdadc
Merge pull request #1 from Ally-Financial/feature/masking
ddzmitry ed9bc40
fix: wording
Jordan-Gilliam 43137ec
docs: add basic example
Jordan-Gilliam ce1a3e9
fix: remove comment and return stream
Jordan-Gilliam 0279385
Merge branch 'main' into feature/masking
Jordan-Gilliam 647b730
Merge pull request #2 from Ally-Financial/feature/masking
Jordan-Gilliam 81f8557
feat: async hooks, immutable parser state
Jordan-Gilliam 94738cc
fix: parse -> mask
Jordan-Gilliam 827d514
fix: || -> ??
Jordan-Gilliam 6e4d945
Merge branch 'feature/masking' of https://github.com/Ally-Financial/l…
Jordan-Gilliam 7198be3
Merge pull request #3 from Ally-Financial/feature/masking
Jordan-Gilliam 8e9360e
Fix lint, style
jacoblee93 da86cc9
Merge branch 'main' of https://github.com/hwchase17/langchainjs into …
jacoblee93 d8efef7
Merge pull request #4 from Ally-Financial/jacob/ally_parser
Jordan-Gilliam 13d4f19
Fix build
jacoblee93 07f2ef6
Update mask.mdx
jacoblee93 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
sidebar_position: 6 | ||
--- | ||
|
||
# Experimental |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Masking | ||
|
||
The experimental masking parser and transformer is an extendable module for masking and rehydrating strings. One of the primary use cases for this module is to redact PII (Personal Identifiable Information) from a string before making a call to an llm. | ||
|
||
### Real world scenario | ||
|
||
A customer support system receives messages containing sensitive customer information. The system must parse these messages, mask any PII (like names, email addresses, and phone numbers), and log them for analysis while complying with privacy regulations. Before logging the transcript a summary is generated using an llm. | ||
|
||
## Get started | ||
|
||
import CodeBlock from "@theme/CodeBlock"; | ||
import ExampleBasic from "@examples/experimental/masking/basic.ts"; | ||
import ExampleNext from "@examples/experimental/masking/next.ts"; | ||
import ExampleKitchenSink from "@examples/experimental/masking/kitchen_sink.ts"; | ||
|
||
### Basic Example | ||
|
||
Use the RegexMaskingTransformer to create a simple mask for email and phone. | ||
|
||
<CodeBlock language="typescript">{ExampleBasic}</CodeBlock> | ||
|
||
:::note | ||
If you plan on storing the masking state to rehydrate the original values asynchronously ensure you are following best security practices. In most cases you will want to define a custom hashing and salting strategy. | ||
::: | ||
|
||
### Next.js stream | ||
|
||
Example nextjs chat endpoint leveraging the RegexMaskingTransformer. The current chat message and chat message history are masked every time the api is called with a chat payload. | ||
|
||
<CodeBlock language="typescript">{ExampleNext}</CodeBlock> | ||
|
||
### Kitchen sink | ||
|
||
<CodeBlock language="typescript">{ExampleKitchenSink}</CodeBlock> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
import { | ||
MaskingParser, | ||
RegexMaskingTransformer, | ||
} from "langchain/experimental/masking"; | ||
|
||
// Define masking strategy | ||
const emailMask = () => `[email-${Math.random().toString(16).slice(2)}]`; | ||
const phoneMask = () => `[phone-${Math.random().toString(16).slice(2)}]`; | ||
|
||
// Configure pii transformer | ||
const piiMaskingTransformer = new RegexMaskingTransformer({ | ||
email: { regex: /\S+@\S+\.\S+/g, mask: emailMask }, | ||
phone: { regex: /\d{3}-\d{3}-\d{4}/g, mask: phoneMask }, | ||
}); | ||
|
||
const maskingParser = new MaskingParser({ | ||
transformers: [piiMaskingTransformer], | ||
}); | ||
maskingParser.addTransformer(piiMaskingTransformer); | ||
|
||
const input = | ||
"Contact me at [email protected] or 555-123-4567. Also reach me at [email protected]"; | ||
const masked = await maskingParser.parse(input); | ||
|
||
console.log(masked); | ||
// Contact me at [email-a31e486e324f6] or [phone-da8fc1584f224]. Also reach me at [email-d5b6237633d95] | ||
|
||
const rehydrated = maskingParser.rehydrate(masked); | ||
console.log(rehydrated); | ||
// Contact me at [email protected] or 555-123-4567. Also reach me at [email protected] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
import { | ||
MaskingParser, | ||
RegexMaskingTransformer, | ||
} from "langchain/experimental/masking"; | ||
|
||
// A simple hash function for demonstration purposes | ||
function simpleHash(input: string): string { | ||
let hash = 0; | ||
for (let i = 0; i < input.length; i += 1) { | ||
const char = input.charCodeAt(i); | ||
hash = (hash << 5) - hash + char; | ||
hash |= 0; // Convert to 32bit integer | ||
} | ||
return hash.toString(16); | ||
} | ||
|
||
const emailMask = (match: string) => `[email-${simpleHash(match)}]`; | ||
const phoneMask = (match: string) => `[phone-${simpleHash(match)}]`; | ||
const nameMask = (match: string) => `[name-${simpleHash(match)}]`; | ||
const ssnMask = (match: string) => `[ssn-${simpleHash(match)}]`; | ||
const creditCardMask = (match: string) => `[creditcard-${simpleHash(match)}]`; | ||
const passportMask = (match: string) => `[passport-${simpleHash(match)}]`; | ||
const licenseMask = (match: string) => `[license-${simpleHash(match)}]`; | ||
const addressMask = (match: string) => `[address-${simpleHash(match)}]`; | ||
const dobMask = (match: string) => `[dob-${simpleHash(match)}]`; | ||
const bankAccountMask = (match: string) => `[bankaccount-${simpleHash(match)}]`; | ||
|
||
// Regular expressions for different types of PII | ||
const patterns = { | ||
email: { regex: /\S+@\S+\.\S+/g, mask: emailMask }, | ||
phone: { regex: /\b\d{3}-\d{3}-\d{4}\b/g, mask: phoneMask }, | ||
name: { regex: /\b[A-Z][a-z]+ [A-Z][a-z]+\b/g, mask: nameMask }, | ||
ssn: { regex: /\b\d{3}-\d{2}-\d{4}\b/g, mask: ssnMask }, | ||
creditCard: { regex: /\b(?:\d{4}[ -]?){3}\d{4}\b/g, mask: creditCardMask }, | ||
passport: { regex: /(?i)\b[A-Z]{1,2}\d{6,9}\b/g, mask: passportMask }, | ||
license: { regex: /(?i)\b[A-Z]{1,2}\d{6,8}\b/g, mask: licenseMask }, | ||
address: { | ||
regex: /\b\d{1,5}\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)\*\b/g, | ||
mask: addressMask, | ||
}, | ||
dob: { regex: /\b\d{4}-\d{2}-\d{2}\b/g, mask: dobMask }, | ||
bankAccount: { regex: /\b\d{8,17}\b/g, mask: bankAccountMask }, | ||
}; | ||
|
||
// Create a RegexMaskingTransformer with multiple patterns | ||
const piiMaskingTransformer = new RegexMaskingTransformer(patterns); | ||
|
||
// Hooks for different stages of masking and rehydrating | ||
const onMaskingStart = (message: string) => | ||
console.log(`Starting to mask message: ${message}`); | ||
const onMaskingEnd = (maskedMessage: string) => | ||
console.log(`Masked message: ${maskedMessage}`); | ||
const onRehydratingStart = (message: string) => | ||
console.log(`Starting to rehydrate message: ${message}`); | ||
const onRehydratingEnd = (rehydratedMessage: string) => | ||
console.log(`Rehydrated message: ${rehydratedMessage}`); | ||
|
||
// Initialize MaskingParser with the transformer and hooks | ||
const maskingParser = new MaskingParser({ | ||
transformers: [piiMaskingTransformer], | ||
onMaskingStart, | ||
onMaskingEnd, | ||
onRehydratingStart, | ||
onRehydratingEnd, | ||
}); | ||
|
||
// Example message containing multiple types of PII | ||
const message = | ||
"Contact Jane Doe at [email protected] or 555-123-4567. Her SSN is 123-45-6789 and her credit card number is 1234-5678-9012-3456. Passport number: AB1234567, Driver's License: X1234567, Address: 123 Main St, Date of Birth: 1990-01-01, Bank Account: 12345678901234567."; | ||
|
||
// Mask and rehydrate the message | ||
maskingParser | ||
.parse(message) | ||
.then((maskedMessage: string) => { | ||
console.log(`Masked message: ${maskedMessage}`); | ||
return maskingParser.rehydrate(maskedMessage); | ||
}) | ||
.then((rehydratedMessage: string) => { | ||
console.log(`Final rehydrated message: ${rehydratedMessage}`); | ||
}); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
// app/api/chat | ||
|
||
import { | ||
MaskingParser, | ||
RegexMaskingTransformer, | ||
} from "langchain/experimental/masking"; | ||
import { PromptTemplate } from "langchain/prompts"; | ||
import { ChatOpenAI } from "langchain/chat_models/openai"; | ||
import { BytesOutputParser } from "langchain/schema/output_parser"; | ||
|
||
export const runtime = "edge"; | ||
|
||
// Function to format chat messages for consistency | ||
const formatMessage = (message: any) => `${message.role}: ${message.content}`; | ||
|
||
const CUSTOMER_SUPPORT = `You are a customer support summarizer agent. Always include masked PII in your response. | ||
Current conversation: | ||
{chat_history} | ||
User: {input} | ||
AI:`; | ||
|
||
// Configure Masking Parser | ||
const maskingParser = new MaskingParser(); | ||
// Define transformations for masking emails and phone numbers using regular expressions | ||
const piiMaskingTransformer = new RegexMaskingTransformer({ | ||
email: { regex: /\S+@\S+\.\S+/g }, // If a regex is provided without a mask we fallback to a simple default hashing function | ||
phone: { regex: /\d{3}-\d{3}-\d{4}/g }, | ||
}); | ||
|
||
maskingParser.addTransformer(piiMaskingTransformer); | ||
|
||
export async function POST(req: Request) { | ||
try { | ||
const body = await req.json(); | ||
const messages = body.messages ?? []; | ||
const formattedPreviousMessages = messages.slice(0, -1).map(formatMessage); | ||
const currentMessageContent = messages[messages.length - 1].content; // Extract the content of the last message | ||
// Mask sensitive information in the current message | ||
const guardedMessageContent = await maskingParser.parse( | ||
currentMessageContent | ||
); | ||
// Mask sensitive information in the chat history | ||
const guardedHistory = await maskingParser.parse( | ||
formattedPreviousMessages.join("\n") | ||
); | ||
|
||
const prompt = PromptTemplate.fromTemplate(CUSTOMER_SUPPORT); | ||
const model = new ChatOpenAI({ temperature: 0.8 }); | ||
// Initialize an output parser that handles serialization and byte-encoding for streaming | ||
const outputParser = new BytesOutputParser(); | ||
const chain = prompt.pipe(model).pipe(outputParser); // Chain the prompt, model, and output parser together | ||
|
||
console.log("[GUARDED INPUT]", guardedMessageContent); // Contact me at -1157967895 or -1626926859. | ||
console.log("[GUARDED HISTORY]", guardedHistory); // user: Contact me at -1157967895 or -1626926859. assistant: Thank you for providing your contact information. | ||
console.log("[STATE]", maskingParser.getState()); // { '-1157967895' => '[email protected]', '-1626926859' => '555-123-4567'} | ||
|
||
// Stream the AI response based on the masked chat history and current message | ||
const stream = await chain.stream({ | ||
chat_history: guardedHistory, | ||
input: guardedMessageContent, | ||
}); | ||
|
||
return new Response(stream, { | ||
headers: { "content-type": "text/plain; charset=utf-8" }, | ||
}); | ||
} catch (e: any) { | ||
return Response.json({ error: e.message }, { status: 500 }); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
export { MaskingParser } from "./parser.js"; | ||
export { RegexMaskingTransformer } from "./regex_masking_transformer.js"; | ||
export { MaskingTransformer } from "./transformer.js"; | ||
export { | ||
type MaskingParserConfig, | ||
type HashFunction, | ||
type HookFunction, | ||
} from "./types.js"; |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow async functions here?