Parallel renames #167

jehna · 2024-10-18T21:30:28Z

My thought is, that it should speed up the process a lot if the renames were done in parallel. Especially if the user has enough OpenAI quota, it could be much faster to process large files by parallelising the work.

Local inference should also be able to be run in parallel, if the user has good enough GPU at hand.

One big problems is, that I've gotten the best results when applying renames from the bottom up – so say we have:

function a() {
  const b = (c) => {
    return c * c
  }
}

It seems that running the rename in order a -> b -> c yields much better results than running c -> b -> a.

But if we'd have multiple same-level identifiers like:

function a() {
  function b() {}
  function c() {}
  function d() {}
}

At least in theory it would be possible to run a first and [b, c, d] in parallel to get feasible results.

In the best case scenario there would be a second LLM step to check that all variables still make sense after the parallel run has finished.

The text was updated successfully, but these errors were encountered:

jehna · 2024-10-18T21:31:14Z

Need to implement proper request throttling and retry logic when doing this

0xdevalias · 2024-10-20T01:38:29Z

Need to implement proper request throttling and retry logic when doing this

[Error] Too many requests #2 (comment)

brianjenkins94 · 2024-10-21T02:51:44Z

Resume-ability would also be a good thing to consider.

0xdevalias · 2024-10-21T04:11:14Z

Resume-ability would also be a good thing to consider.

Some of the discussion in the following issue could tangentially relate to resumability (specifically if a consistent 'map' of renames was created, perhaps that could also show which sections of the code hadn't yet been processed):

More deterministic renames across different versions of the same code #97

brianjenkins94 · 2024-10-23T00:57:37Z

I'm trying to process a pretty huge file and just ran into this:

RateLimitError: 429 Rate limit reached for gpt-4o-mini in organization org-abcdefghijklmnopqrstuvwx on requests per day (RPD): Limit 10000, Used 10000

I'm going to see about improving the rate limiting here:

// /src/plugins/openai/openai-rename.ts
+import Bottleneck from "bottleneck/light";

+// Math.floor(10_000 / 24) requests/hour
+const limiter = new Bottleneck({
+	"reservoir": Math.floor(10_000 / 24),
+	"reservoirRefreshAmount": Math.floor(10_000 / 24),
+	"reservoirRefreshInterval": 3_600_000
+});

export function openaiRename({
  apiKey,
  baseURL,
  model,
  contextWindowSize
}: {
  apiKey: string;
  baseURL: string;
  model: string;
  contextWindowSize: number;
}) {
  const client = new OpenAI({ apiKey, baseURL });

+  const wrapped = limiter.wrap(async (code: string): Promise<string> => {
    return await visitAllIdentifiers(
      code,
      async (name, surroundingCode) => {
        verbose.log(`Renaming ${name}`);
        verbose.log("Context: ", surroundingCode);

        const response = await client.chat.completions.create(
          toRenamePrompt(name, surroundingCode, model)
        );
        const result = response.choices[0].message?.content;
        if (!result) {
          throw new Error("Failed to rename", { cause: response });
        }
        const renamed = JSON.parse(result).newName;

        verbose.log(`Renamed to ${renamed}`);

        return renamed;
      },
      contextWindowSize,
      showPercentage
    );
+  });

+  return wrapped();
}

0xdevalias · 2024-10-23T04:17:11Z

Context from other thread:

Aside: Can I run humanify against multiple files simultaneously? Or would that run the risk of making requests too fast?

In trying to produce benchmark results for #172, I have determined that simultaneous runs cause 429s and cause the application to crash.

Just closing the loop.

Originally posted by @brianjenkins94 in #67 (comment)

neoOpus · 2024-10-24T20:25:56Z

Context from other thread:

Aside: Can I run humanify against multiple files simultaneously? Or would that run the risk of making requests too fast?

In trying to produce benchmark results for #172, I have determined that simultaneous runs cause 429s and cause the application to crash.
Just closing the loop.
Originally posted by @brianjenkins94 in #67 (comment)

Could we have a PR with the majority of the fixes, even if it's not production ready? I paused my work as I lost track of the tasks and became discouraged by the errors, compounded by a sluggish machine. I still want to deobfuscate some Chrome extensions to modify them or understand their functions better.

jehna added the enhancement New feature or request label Oct 18, 2024

jehna mentioned this issue Oct 18, 2024

Error while renaming a variable. (TypeError: Duplicate declaration) #67

Closed

0xdevalias mentioned this issue Oct 21, 2024

More deterministic renames across different versions of the same code #97

Open

This was referenced Oct 23, 2024

Error working with large files - Gemini #139

Open

[Error] Too many requests #2

Open

Failed when processing a large file (13MB) #174

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel renames #167

Parallel renames #167

jehna commented Oct 18, 2024

jehna commented Oct 18, 2024

0xdevalias commented Oct 20, 2024

brianjenkins94 commented Oct 21, 2024 •

edited

Loading

0xdevalias commented Oct 21, 2024

brianjenkins94 commented Oct 23, 2024 •

edited

Loading

0xdevalias commented Oct 23, 2024

neoOpus commented Oct 24, 2024

Parallel renames #167

Parallel renames #167

Comments

jehna commented Oct 18, 2024

jehna commented Oct 18, 2024

0xdevalias commented Oct 20, 2024

brianjenkins94 commented Oct 21, 2024 • edited Loading

0xdevalias commented Oct 21, 2024

brianjenkins94 commented Oct 23, 2024 • edited Loading

0xdevalias commented Oct 23, 2024

neoOpus commented Oct 24, 2024

brianjenkins94 commented Oct 21, 2024 •

edited

Loading

brianjenkins94 commented Oct 23, 2024 •

edited

Loading