Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel renames #167

Open
jehna opened this issue Oct 18, 2024 · 7 comments
Open

Parallel renames #167

jehna opened this issue Oct 18, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@jehna
Copy link
Owner

jehna commented Oct 18, 2024

My thought is, that it should speed up the process a lot if the renames were done in parallel. Especially if the user has enough OpenAI quota, it could be much faster to process large files by parallelising the work.

Local inference should also be able to be run in parallel, if the user has good enough GPU at hand.

One big problems is, that I've gotten the best results when applying renames from the bottom up – so say we have:

function a() {
  const b = (c) => {
    return c * c
  }
}

It seems that running the rename in order a -> b -> c yields much better results than running c -> b -> a.

But if we'd have multiple same-level identifiers like:

function a() {
  function b() {}
  function c() {}
  function d() {}
}

At least in theory it would be possible to run a first and [b, c, d] in parallel to get feasible results.

In the best case scenario there would be a second LLM step to check that all variables still make sense after the parallel run has finished.

@jehna jehna added the enhancement New feature or request label Oct 18, 2024
@jehna
Copy link
Owner Author

jehna commented Oct 18, 2024

Need to implement proper request throttling and retry logic when doing this

@0xdevalias
Copy link

Need to implement proper request throttling and retry logic when doing this

Related:

This seems to be the section of code for implementing better throttling/retry logic (at least for the openai plugin):

@brianjenkins94
Copy link

brianjenkins94 commented Oct 21, 2024

Resume-ability would also be a good thing to consider.

@0xdevalias
Copy link

Resume-ability would also be a good thing to consider.

Some of the discussion in the following issue could tangentially relate to resumability (specifically if a consistent 'map' of renames was created, perhaps that could also show which sections of the code hadn't yet been processed):

@brianjenkins94
Copy link

brianjenkins94 commented Oct 23, 2024

I'm trying to process a pretty huge file and just ran into this:

RateLimitError: 429 Rate limit reached for gpt-4o-mini in organization org-abcdefghijklmnopqrstuvwx on requests per day (RPD): Limit 10000, Used 10000

I'm going to see about improving the rate limiting here:

// /src/plugins/openai/openai-rename.ts
+import Bottleneck from "bottleneck/light";

+// Math.floor(10_000 / 24) requests/hour
+const limiter = new Bottleneck({
+	"reservoir": Math.floor(10_000 / 24),
+	"reservoirRefreshAmount": Math.floor(10_000 / 24),
+	"reservoirRefreshInterval": 3_600_000
+});

export function openaiRename({
  apiKey,
  baseURL,
  model,
  contextWindowSize
}: {
  apiKey: string;
  baseURL: string;
  model: string;
  contextWindowSize: number;
}) {
  const client = new OpenAI({ apiKey, baseURL });

+  const wrapped = limiter.wrap(async (code: string): Promise<string> => {
    return await visitAllIdentifiers(
      code,
      async (name, surroundingCode) => {
        verbose.log(`Renaming ${name}`);
        verbose.log("Context: ", surroundingCode);

        const response = await client.chat.completions.create(
          toRenamePrompt(name, surroundingCode, model)
        );
        const result = response.choices[0].message?.content;
        if (!result) {
          throw new Error("Failed to rename", { cause: response });
        }
        const renamed = JSON.parse(result).newName;

        verbose.log(`Renamed to ${renamed}`);

        return renamed;
      },
      contextWindowSize,
      showPercentage
    );
+  });

+  return wrapped();
}

@0xdevalias
Copy link

Context from other thread:

Aside: Can I run humanify against multiple files simultaneously? Or would that run the risk of making requests too fast?

In trying to produce benchmark results for #172, I have determined that simultaneous runs cause 429s and cause the application to crash.

Just closing the loop.

Originally posted by @brianjenkins94 in #67 (comment)

@neoOpus
Copy link

neoOpus commented Oct 24, 2024

Context from other thread:

Aside: Can I run humanify against multiple files simultaneously? Or would that run the risk of making requests too fast?

In trying to produce benchmark results for #172, I have determined that simultaneous runs cause 429s and cause the application to crash.
Just closing the loop.
Originally posted by @brianjenkins94 in #67 (comment)

Could we have a PR with the majority of the fixes, even if it's not production ready? I paused my work as I lost track of the tasks and became discouraged by the errors, compounded by a sluggish machine. I still want to deobfuscate some Chrome extensions to modify them or understand their functions better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants