Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULTs when parallelizing OpenAI transcriptions #14918

Closed
par5ul1 opened this issue Oct 30, 2024 · 6 comments · Fixed by #14921
Closed

SEGFAULTs when parallelizing OpenAI transcriptions #14918

par5ul1 opened this issue Oct 30, 2024 · 6 comments · Fixed by #14921
Labels
confirmed bug We can reproduce this issue crash An issue that could cause a crash macOS An issue that occurs on macOS

Comments

@par5ul1
Copy link

par5ul1 commented Oct 30, 2024

How can we reproduce the crash?

Hey y'all! After much investigative work, I am happy to present a segmentation fault that has been driving me nuts. The only way I have been able to reproduce this has been by a) creating creating an mp3 using the ffmpeg command attached, and b) passing it into the official openai SDK for transcription. Luckily, the OAI SDK source code is clean and accessible in the package so the implementations are not opaque. Please let me know if I can provide any more insight.

I attached a combo of a few different reports below. I don't have all of them but these should be helpful enough I hope.

EDIT: I added a crash from the latest version of Bun. I'm on an outdated one on my project due to a different unfixed bug.

As minimal repro as I could get with consistent breakages. Unfortunately, an OpenAI key is required. I really tried to make this break without.

import { spawn } from "child_process";
import fs from "fs";
import { Buffer } from "node:buffer";
import OpenAI from "openai";
import { toFile } from "openai/uploads.mjs";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function simulateFFmpegAudioExtraction(
  duration: number
): Promise<Buffer> {
  return new Promise((resolve, reject) => {
    const ffmpeg = spawn("ffmpeg", [
      "-i",
      "pipe:0",
      "-ss",
      "00:00:00",
      "-t",
      duration.toString(),
      "-q:a",
      "2",
      "-f",
      "mp3",
      "pipe:1",
    ]);

    const chunks: Buffer[] = [];

    ffmpeg.stdout.on("data", (chunk) => chunks.push(Buffer.from(chunk)));
    ffmpeg.stderr.on("error", (data) =>
      console.error(`ffmpeg stderr: ${data}`)
    );

    ffmpeg.on("close", (code) => {
      if (code === 0) {
        const outputBuffer = Buffer.concat(chunks);
        const base64Data = outputBuffer.toString("base64");
        resolve(Buffer.from(base64Data, "base64"));
      } else {
        reject(new Error(`ffmpeg exited with code ${code}`));
      }
    });

    const inputBuffer = fs.readFileSync("./audio.mp3");
    ffmpeg.stdin.write(inputBuffer);
    ffmpeg.stdin.end();
  });
}

async function processAudioChunk(index: number) {
  const audioData = await simulateFFmpegAudioExtraction(5);
  const audio = await toFile(audioData, "audio.mp3");
  await openai.audio.transcriptions.create({
    file: audio,
    model: "whisper-1",
    prompt: "test",
  });
}

async function runParallel() {
  const CONCURRENCY = 50;
  const TOTAL = 250;
  const active = new Set<Promise<void>>();

  for (let i = 0; i < TOTAL; i++) {
    const promise = processAudioChunk(i)
      .then(() => {
        active.delete(promise);
        console.log(`Completed ${i + 1}/${TOTAL}`);
      })
      .catch((error) => {
        console.error(`Error in chunk ${i}:`, error);
        active.delete(promise);
        throw error;
      });

    active.add(promise);

    if (active.size >= CONCURRENCY) {
      await Promise.race(Array.from(active));
    }
  }

  await Promise.all(Array.from(active));
}

runParallel()
  .then(() => console.log("Completed successfully"))
  .catch((error) => {
    console.error("Fatal error:", error);
    process.exit(1);
  });

Relevant log output

No response

Stack Trace (bun.report)

Bun v1.1.29 (6d43b36) on macos aarch64 [RunCommand]

Segmentation fault at address 0x00050004

  • 1 unknown/js code
  • JSC::JSLexicalEnvironment::create
  • operationCreateLexicalEnvironmentUndefined
  • 6 unknown/js code
  • vmEntryToJavaScript

Features: jsc, Bun.stdin, dotenv, fetch, spawn, transpiler_cache, tsconfig_paths, tsconfig

Sentry Issue: BUN-7R0


Bun v1.1.29 (6d43b36) on macos aarch64 [AutoCommand]

Segmentation fault at address 0x101B411000000

  • 1 unknown/js code
  • JSC::ArrayBufferContents::~ArrayBufferContents
  • WTF::DeferrableRefCounted<JSC::ArrayBuffer>::setIsDeferred
  • bool JSC::GCIncomingRefCounted<JSC::ArrayBuffer>::filterIncomingReferences<JSC::GCIncomingRefCountedSet<JSC::ArrayBuffer>::sweep(...)::'lambda'(...)::operator(...) const::'lambda'(...)>
  • JSC::Heap::runEndPhase
  • JSC::Heap::runCurrentPhase
  • WTF::ScopedLambdaFunctor<void (...), JSC::Heap::collectInMutatorThread()::$_0>::implFunction
  • JSC::callWithCurrentThreadState
  • JSC::Heap::collectInMutatorThread
  • JSC::Heap::collectIfNecessaryOrDefer

Features: jsc, Bun.stdin, dotenv, fetch, spawn, tsconfig


Bun v1.1.29 (6d43b36) on macos aarch64 [RunCommand]

Segmentation fault at address 0x93D5BECD8

  • 1 unknown/js code
  • JSC::DFG::ByteCodeParser::Terminality JSC::DFG::ByteCodeParser::handleVarargsCall<JSC::OpTailCallVarargs>
  • JSC::DFG::ByteCodeParser::Terminality JSC::DFG::ByteCodeParser::handleVarargsCall<JSC::OpTailCallVarargs>
  • JSC::DFG::ByteCodeParser::parseBlock
  • JSC::DFG::ByteCodeParser::parseCodeBlock
  • JSC::DFG::ByteCodeParser::parse
  • JSC::DFG::parse
  • JSC::DFG::Plan::compileInThreadImpl
  • JSC::JITPlan::compileInThread
  • JSC::JITWorklistThread::work

Features: jsc, Bun.stdin, dotenv, fetch, spawn, transpiler_cache, tsconfig_paths, tsconfig

Sentry Issue: BUN-7QR


Bun v1.1.29 (6d43b36) on macos aarch64 [RunCommand]

Segmentation fault at address 0x444800013EF64446

  • 1 unknown/js code
  • decltype(...) std::__1::__variant_detail::__visitation::__base::__dispatcher<0ul>::__dispatch[abi:nn180100]<std::__1::__variant_detail::__dtor<std::__1::__variant_detail::__traits<JSC::StructureTransitionStructureStubClearingWatchpoint, JSC::AdaptiveValueStructureStubClearingWatchpoint>, (...)1>::__destroy[abi:nn180100](...)&&, std::__1::__variant_detail::__base<(...)1, JSC::StructureTransitionStructureStubClearingWatchpoint, JSC::AdaptiveValueStructureStubClearingWatchpoint>&>
  • decltype(...) std::__1::__variant_detail::__visitation::__base::__dispatcher<0ul>::__dispatch[abi:nn180100]<std::__1::__variant_detail::__dtor<std::__1::__variant_detail::__traits<JSC::StructureTransitionStructureStubClearingWatchpoint, JSC::AdaptiveValueStructureStubClearingWatchpoint>, (...)1>::__destroy[abi:nn180100](...)&&, std::__1::__variant_detail::__base<(...)1, JSC::StructureTransitionStructureStubClearingWatchpoint, JSC::AdaptiveValueStructureStubClearingWatchpoint>&>
  • JSC::PolymorphicAccessJITStubRoutine::observeZeroRefCountImpl
  • JSC::InlineCacheHandler::~InlineCacheHandler
  • JSC::InlineCacheHandler::~InlineCacheHandler
  • JSC::StructureStubInfo::~StructureStubInfo
  • JSC::CodeBlock::~CodeBlock
  • void JSC::MarkedBlock::Handle::specializedSweep<true, (...)1, (...)1, (...)1, (...)0, (...)1, (...)1, JSC::DefaultDestroyFunc>
  • void JSC::MarkedBlock::Handle::finishSweepKnowingHeapCellType<JSC::DefaultDestroyFunc>(...)::'lambda'(...)

Features: jsc, Bun.stdin, dotenv, fetch, spawn, transpiler_cache, tsconfig_paths, tsconfig

Sentry Issue: BUN-5MM


Bun v1.1.33 (247456b) on macos aarch64 [RunCommand]

Segmentation fault at address 0x112DB57A00020

  • 1 unknown/js code
  • JSC::ObjectPropertyConditionSet::numberOfConditionsWithKind
  • JSC::ObjectPropertyConditionSet::numberOfConditionsWithKind
  • bool JSC::OpRet::emitImpl<(...)1, true, JSC::BytecodeGenerator>
  • JSC::Heap::webAssemblyFunctionSpaceSlow
  • WTF::SharedTaskFunctor<void (...), JSC::FTL::Output::doubleTrunc(...)::$_0>::~SharedTaskFunctor
  • JSC::BlockDirectory::findBlockForAllocation
  • JSC::FTL::slowPathCallThunkGenerator
  • JSC::GCClient::Heap::javaScriptCallFrameSpaceSlow
  • void std::__1::__introsort<std::__1::_ClassicAlgPolicy, JSC::HeapSnapshot::finalize()::$_0&, JSC::HeapSnapshotNode*, false>

Features: Bun.stdin, dotenv, fetch, jsc, spawn, tsconfig

Sentry Issue: BUN-7R5

@par5ul1 par5ul1 added the crash An issue that could cause a crash label Oct 30, 2024
@github-actions github-actions bot added the macOS An issue that occurs on macOS label Oct 30, 2024
Copy link
Contributor

@par5ul1, thank you for reporting this crash. The latest version of Bun is v1.1.33, but this crash was reported on Bun v1.1.29.

Are you able to reproduce this crash on the latest version of Bun?

bun upgrade

For Bun's internal tracking, this issue is BUN-7R1.

@par5ul1
Copy link
Author

par5ul1 commented Oct 30, 2024

I updated with a crash report from bun@latest

@Jarred-Sumner Jarred-Sumner added the confirmed bug We can reproduce this issue label Oct 31, 2024
Jarred-Sumner added a commit that referenced this issue Oct 31, 2024
@Jarred-Sumner
Copy link
Collaborator

Jarred-Sumner commented Oct 31, 2024

@par5ul1 thanks for the reproduction. The fix will land in Bun v1.1.34.

Regarding #13745, set idleTimeout: 0 in Bun.serve() if you want to disable the default request timeout of 10 seconds. This is disabled by default in node:http.

@par5ul1
Copy link
Author

par5ul1 commented Oct 31, 2024

@Jarred-Sumner thanks for the speedy fix. Hope to see it land soon. If you have the energy for it, could I get a TL;DR of what the issue ended up being?

@Jarred-Sumner
Copy link
Collaborator

Jarred-Sumner commented Oct 31, 2024 via email

@par5ul1
Copy link
Author

par5ul1 commented Oct 31, 2024

Makes sense!

190n pushed a commit that referenced this issue Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed bug We can reproduce this issue crash An issue that could cause a crash macOS An issue that occurs on macOS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants