Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added cleanup script for hanging Chrome processes and temp files #2173

Merged
merged 5 commits into from
Mar 26, 2024
Merged

Added cleanup script for hanging Chrome processes and temp files #2173

merged 5 commits into from
Mar 26, 2024

Conversation

cedricroijakkers
Copy link
Contributor

@cedricroijakkers cedricroijakkers commented Mar 22, 2024

User description

Description

Added a script (running as a daemon of sorts under supervisord) to clean up files and processes left there by the Chrome browser.

Motivation and Context

We've been running a Selenium grid for quite some time, using it to execute all kinds of automated monitoring tasks on websites, both running in the internal network and on the internet.

Since we're using the Chrome browser exclusively and our grids are long-running, we've noticed that after some time two types of items remain in the containers (consuming PIDs and disk space):

  • Hanging Chrome browser processes. Most of our jobs run for a few minutes, yet after many hours, some Chrome processes remain active.
  • Temporary files of the format /tmp/.com.google.Chrome.*, after some time these consume quite a lot of disk space, blowing up the container file system.

I have created a shell script that cleans up these two items which runs once an hour. Temporary files on disk will be removed immediately, Chrome processes after they have been active for over 2 hours. This is what we are doing in our setup using an ugly hack by doing it as part of the Docker healthcheck, and it solved all our capacity problems.

The script has only been added to the Chrome node image, since I cannot tell if the same issue is also present on other browsers. Even if it is, the fix will be different since the browser is different, so the change is not applicable there.

A new script has been added under /opt/bin/chrome-cleanup.sh in the container, and a supervisord service is added using configuration file /etc/supervisor/conf.d/chrome-cleanup.conf.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Type

Bug fix, Enhancement


Description

  • Adds a new shell script (chrome-cleanup.sh) to clean up hanging Chrome processes and temporary files in the NodeChrome Docker image.
  • The cleanup script targets Chrome temp files older than 1 day and processes older than 2 hours for termination.
  • The script is integrated into the Docker image via the Dockerfile and is managed by supervisord (chrome-cleanup.conf), ensuring it runs continuously as a daemon.
  • This change addresses issues with lingering Chrome processes and bloated temp directories in long-running Selenium grid containers.

Changes walkthrough

Relevant files
Enhancement
chrome-cleanup.sh
Add Chrome Cleanup Daemon Script                                                 

NodeChrome/chrome-cleanup.sh

  • Adds a script to delete Chrome temp files and kill hanging Chrome
    processes.
  • Temp files older than 1 day and processes older than 2 hours are
    targeted.
  • Script runs in an infinite loop, with a cleanup cycle every hour.
  • +31/-0   
    Configuration changes
    Dockerfile
    Integrate Chrome Cleanup Script into Docker Image               

    NodeChrome/Dockerfile

  • Adds the Chrome cleanup script and supervisord configuration file to
    the Docker image.
  • +6/-0     
    chrome-cleanup.conf
    Supervisord Configuration for Chrome Cleanup                         

    NodeChrome/chrome-cleanup.conf

  • Adds a supervisord configuration for the Chrome cleanup script.
  • Configures logging, autostart, and restart policies for the cleanup
    script.
  • +20/-0   

    PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    @CLAassistant
    Copy link

    CLAassistant commented Mar 22, 2024

    CLA assistant check
    All committers have signed the CLA.

    Copy link

    PR Description updated to latest commit (b00f2f7)

    Copy link

    PR Review

    ⏱️ Estimated effort to review [1-5]

    2, because the changes are straightforward and localized to a specific functionality (cleanup script and its supervisord configuration). The script logic is simple, and the supervisord configuration is standard.

    🧪 Relevant tests

    No

    🔍 Possible issues

    Hardcoded Paths: The script assumes that Chrome temporary files are located in /tmp. This might not be the case in all environments or future Chrome versions.

    Process Killing Method: Using kill -9 to terminate processes can be unsafe for system stability. It's recommended to try a softer termination method first, such as SIGTERM, and fall back to SIGKILL only if necessary.

    Error Handling: The script lacks error handling, particularly for the find and kill commands. This could lead to silent failures in some scenarios.

    Efficiency: The script wakes up every hour to perform cleanup, which might not be the most efficient approach. Consider triggering cleanup based on system events or resource thresholds.

    🔒 Security concerns

    No


    ✨ Review tool usage guide:

    Overview:
    The review tool scans the PR code changes, and generates a PR review. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.
    When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:

    /review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...
    

    With a configuration file, use the following template:

    [pr_reviewer]
    some_config1=...
    some_config2=...
    
    Utilizing extra instructions

    The review tool can be configured with extra instructions, which can be used to guide the model to a feedback tailored to the needs of your project.

    Be specific, clear, and concise in the instructions. With extra instructions, you are the prompter. Specify the relevant sub-tool, and the relevant aspects of the PR that you want to emphasize.

    Examples for extra instructions:

    [pr_reviewer] # /review #
    extra_instructions="""
    In the 'possible issues' section, emphasize the following:
    - Does the code logic cover relevant edge cases?
    - Is the code logic clear and easy to understand?
    - Is the code logic efficient?
    ...
    """
    

    Use triple quotes to write multi-line instructions. Use bullet points to make the instructions more readable.

    How to enable\disable automation
    • When you first install PR-Agent app, the default mode for the review tool is:
    pr_commands = ["/review", ...]
    

    meaning the review tool will run automatically on every PR, with the default configuration.
    Edit this field to enable/disable the tool, or to change the used configurations

    Auto-labels

    The review tool can auto-generate two specific types of labels for a PR:

    • a possible security issue label, that detects possible security issues (enable_review_labels_security flag)
    • a Review effort [1-5]: x label, where x is the estimated effort to review the PR (enable_review_labels_effort flag)
    Extra sub-tools

    The review tool provides a collection of possible feedbacks about a PR.
    It is recommended to review the possible options, and choose the ones relevant for your use case.
    Some of the feature that are disabled by default are quite useful, and should be considered for enabling. For example:
    require_score_review, require_soc2_ticket, require_can_be_split_review, and more.

    Auto-approve PRs

    By invoking:

    /review auto_approve
    

    The tool will automatically approve the PR, and add a comment with the approval.

    To ensure safety, the auto-approval feature is disabled by default. To enable auto-approval, you need to actively set in a pre-defined configuration file the following:

    [pr_reviewer]
    enable_auto_approval = true
    

    (this specific flag cannot be set with a command line argument, only in the configuration file, committed to the repository)

    You can also enable auto-approval only if the PR meets certain requirements, such as that the estimated_review_effort is equal or below a certain threshold, by adjusting the flag:

    [pr_reviewer]
    maximal_review_effort = 5
    
    More PR-Agent commands

    To invoke the PR-Agent, add a comment using one of the following commands:

    • /review: Request a review of your Pull Request.
    • /describe: Update the PR title and description based on the contents of the PR.
    • /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
    • /ask <QUESTION>: Ask a question about the PR.
    • /update_changelog: Update the changelog based on the PR's contents.
    • /add_docs 💎: Generate docstring for new components introduced in the PR.
    • /generate_labels 💎: Generate labels for the PR based on the PR's contents.
    • /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component.

    See the tools guide for more details.
    To list the possible configuration parameters, add a /config comment.

    See the review usage page for a comprehensive guide on using this tool.

    Copy link

    qodo-merge-pro bot commented Mar 22, 2024

    PR Code Suggestions

    CategorySuggestions                                                                                                                                                       
    Enhancement
    Improve process termination to allow graceful shutdown before using kill -9.

    Using kill -9 to terminate processes can be unsafe as it does not allow the process to
    clean up after itself. Consider using kill alone first to send a SIGTERM, allowing the
    process to terminate gracefully, and then use kill -9 only if the process does not
    terminate after a timeout.

    NodeChrome/chrome-cleanup.sh [11]

    -ps -e -o pid,etimes,command | grep -v grep | grep chrome/chrome | awk '{if($2>1200) print $0}' | awk '{print $1}' | xargs -r kill -9
    +for pid in $(ps -e -o pid,etimes,command | grep -v grep | grep chrome/chrome | awk '{if($2>1200) print $1}'); do
    +  kill $pid
    +  sleep 5
    +  if kill -0 $pid 2>/dev/null; then
    +    echo "Process $pid did not terminate, using kill -9."
    +    kill -9 $pid
    +  fi
    +done
     
    Add a mechanism to gracefully exit the infinite loop.

    Using an infinite loop with while : is not recommended without a clear exit condition.
    Consider adding a mechanism to gracefully exit the loop, such as trapping a signal.

    NodeChrome/chrome-cleanup.sh [16]

    +trap "echo 'Signal caught, exiting'; exit" SIGINT SIGTERM
     while :
     
    Best practice
    Quote variables to prevent globbing and word splitting.

    It's recommended to quote variables in bash to prevent globbing and word splitting. This
    is especially important when dealing with file paths that may contain spaces or special
    characters.

    NodeChrome/chrome-cleanup.sh [5]

    -find /tmp -name ".com.google.Chrome.*" -type d -mtime +1 -exec rm -rf {} +
    +find /tmp -name ".com.google.Chrome.*" -type d -mtime +1 -exec rm -rf "{}" +
     
    Add a safety check to ensure the script is running as root.

    Consider adding a safety check to ensure the script is running as root before attempting
    to delete files or kill processes, as these operations typically require root privileges.

    NodeChrome/chrome-cleanup.sh [1]

     #!/bin/bash
    +if [ "$(id -u)" != "0" ]; then
    +  echo "This script must be run as root" 1>&2
    +  exit 1
    +fi
     
    Maintainability
    Separate cleanup functions into their own scripts for better maintainability.

    To improve readability and maintainability, consider separating the cleanup functions into
    their own scripts and calling them from this main script. This modular approach makes it
    easier to manage individual tasks.

    NodeChrome/chrome-cleanup.sh [3-12]

    -cleanup_tmp_chrome_files() {
    +# Assuming separate scripts named cleanup_tmp_chrome_files.sh and cleanup_stuck_chrome_processes.sh
    +./cleanup_tmp_chrome_files.sh
     ...
    -cleanup_stuck_chrome_processes() {
    +./cleanup_stuck_chrome_processes.sh
     

    ✨ Improve tool usage guide:

    Overview:
    The improve tool scans the PR code changes, and automatically generates suggestions for improving the PR code. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on a PR.
    When commenting, to edit configurations related to the improve tool (pr_code_suggestions section), use the following template:

    /improve --pr_code_suggestions.some_config1=... --pr_code_suggestions.some_config2=...
    

    With a configuration file, use the following template:

    [pr_code_suggestions]
    some_config1=...
    some_config2=...
    
    Enabling\disabling automation

    When you first install the app, the default mode for the improve tool is:

    pr_commands = ["/improve --pr_code_suggestions.summarize=true", ...]
    

    meaning the improve tool will run automatically on every PR, with summarization enabled. Delete this line to disable the tool from running automatically.

    Utilizing extra instructions

    Extra instructions are very important for the improve tool, since they enable to guide the model to suggestions that are more relevant to the specific needs of the project.

    Be specific, clear, and concise in the instructions. With extra instructions, you are the prompter. Specify relevant aspects that you want the model to focus on.

    Examples for extra instructions:

    [pr_code_suggestions] # /improve #
    extra_instructions="""
    Emphasize the following aspects:
    - Does the code logic cover relevant edge cases?
    - Is the code logic clear and easy to understand?
    - Is the code logic efficient?
    ...
    """
    

    Use triple quotes to write multi-line instructions. Use bullet points to make the instructions more readable.

    A note on code suggestions quality
    • While the current AI for code is getting better and better (GPT-4), it's not flawless. Not all the suggestions will be perfect, and a user should not accept all of them automatically.
    • Suggestions are not meant to be simplistic. Instead, they aim to give deep feedback and raise questions, ideas and thoughts to the user, who can then use his judgment, experience, and understanding of the code base.
    • Recommended to use the 'extra_instructions' field to guide the model to suggestions that are more relevant to the specific needs of the project, or use the custom suggestions 💎 tool
    • With large PRs, best quality will be obtained by using 'improve --extended' mode.
    More PR-Agent commands

    To invoke the PR-Agent, add a comment using one of the following commands:

    • /review: Request a review of your Pull Request.
    • /describe: Update the PR title and description based on the contents of the PR.
    • /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
    • /ask <QUESTION>: Ask a question about the PR.
    • /update_changelog: Update the changelog based on the PR's contents.
    • /add_docs 💎: Generate docstring for new components introduced in the PR.
    • /generate_labels 💎: Generate labels for the PR based on the PR's contents.
    • /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component.

    See the tools guide for more details.
    To list the possible configuration parameters, add a /config comment.

    See the improve usage page for a more comprehensive guide on using this tool.

    @VietND96 VietND96 requested a review from diemol March 22, 2024 14:38
    Copy link
    Member

    @diemol diemol left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    This is probably a good idea. Do you have time to add the same to Edge and Firefox?

    @cedricroijakkers
    Copy link
    Contributor Author

    This is probably a good idea. Do you have time to add the same to Edge and Firefox?

    I would have to investigate if and where those browsers save their files, but yes. I will check somewhere next week.

    @VietND96
    Copy link
    Member

    Do you think the older time in seconds (default 1200) and the daemon interval in seconds (default 3600) are able to set different values via an ENV var?

    @cedricroijakkers
    Copy link
    Contributor Author

    Do you think the older time in seconds (default 1200) and the daemon interval in seconds (default 3600) are able to set different values via an ENV var?

    While creating this PR I was actually thinking that too, yes. Maybe even create an ENV var to disable the cleanup alltogether? I will have a look and update the PR.

    @VietND96
    Copy link
    Member

    Yes, ENV var to enable/disable is also fine. I see supervisord is used to start the script, so you can disable it in .conf something like

    command=bash -c "if [ ${SE_VIDEO_INTERNAL_UPLOAD} = "true" ]; then /opt/bin/upload.sh; fi"
    autostart=%(ENV_SE_VIDEO_INTERNAL_UPLOAD)s
    autorestart=%(ENV_SE_VIDEO_INTERNAL_UPLOAD)s

    - Allowed cleanup to be enabled and/or configured via environment variables
    - Updated documentation with new options
    @cedricroijakkers
    Copy link
    Contributor Author

    I've updated the PR with your comments:

    • Cleanup script added for Firefox and Edge (and cleaned up code a little bit)
    • Added env vars to enable/disable the cleanup, and when enabled configure timings and parameters
    • Added new environment vars to the documentation

    @VietND96
    Copy link
    Member

    LGTM! How about you, @diemol ?

    Copy link
    Member

    @diemol diemol left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Should we leave SE_ENABLE_BROWSER_LEFTOVERS_CLEANUP enabled by default?

    Several folks use these images for web scraping; sessions could last longer than an hour.

    What do you all think?

    NodeChrome/Dockerfile Outdated Show resolved Hide resolved
    @cedricroijakkers
    Copy link
    Contributor Author

    Should we leave SE_ENABLE_BROWSER_LEFTOVERS_CLEANUP enabled by default?

    Several folks use these images for web scraping; sessions could last longer than an hour.

    Fair point. I thought about this as well. Maybe it is better to err on the side of caution, and leave the cleanup disabled by default.

    I have changed the default value to false and updated the Readme accordingly.

    Maybe these can be placed in NodeBase? We can avoid some duplication.

    Done!

    Copy link
    Member

    @diemol diemol left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thank you, @cedricroijakkers!

    @VietND96 feel free to merge whenever you have time. Thanks!

    @VietND96 VietND96 self-requested a review March 26, 2024 14:42
    @VietND96 VietND96 merged commit 3909301 into SeleniumHQ:trunk Mar 26, 2024
    11 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    4 participants