FastSAM (Fast Segment Anything Model) - Ultralytics YOLOv8 Docs #3417

2023-06-27T16:14:46Z

giscus[bot]
bot Jun 27, 2023

FastSAM (Fast Segment Anything Model) - Ultralytics YOLOv8 Docs

Explore the Fast Segment Anything Model (FastSAM), a real-time solution for the segment anything task that leverages a Convolutional Neural Network (CNN) for segmenting any object within an image, guided by user interaction prompts.

https://docs.ultralytics.com/models/fast-sam/

Atomadeus · 2023-07-06T17:36:27Z

Atomadeus
Jul 6, 2023 — with giscus

How can I obtain the information of the masks after image inference?

3 replies

Aravinth-Natarajan Jul 19, 2023

I also want to know this. the docs aren't quite clear on the commands availalble

romanvelichkin Nov 15, 2023 — with giscus

Same as with other YOLO models they saved in boxes

results = model(as in example)

# parse images in results
for result in results:
    print(result.boxes)

# or
print(results[image_num].boxes)

pderrenger Feb 6, 2024
Maintainer

@romanvelichkin to obtain the information of the masks after performing inference with a segmentation model, you can access the masks attribute of the Results object. Here's how you can do it:

from ultralytics import YOLO

# Load your trained segmentation model
model = YOLO('path/to/your/segmentation-model.pt')

# Perform inference on an image
results = model('path/to/your/image.jpg')

# Access the masks for the first image in the results
masks = results[0].masks

# Masks attributes
mask_data = masks.data  # The actual mask data, a tensor of shape (N, H, W)
mask_xy = masks.xy      # x,y segments (pixels), a list of segments for each detected instance
mask_xyn = masks.xyn    # x,y segments (normalized), a list of normalized segments for each instance

# You can iterate over the masks and do something with them
for i, mask in enumerate(mask_data):
    print(f"Mask {i} has shape: {mask.shape}")

Remember that the masks attribute will only be populated if you are using a segmentation model that supports mask output. If you're using a detection model, the masks attribute will not be available.

For more detailed information on handling the results and the various attributes available, please refer to the Predict section of the Ultralytics YOLOv8 documentation.

litfish · 2023-07-24T09:08:26Z

litfish
Jul 24, 2023 — with giscus

working on decode from the output
Float32 1 × 32 × 8400 array .mc
Float32 1 × 32 × 160 × 160 array .p

6 replies

SearchDream Apr 12, 2024

What's the width and height for the grid? I am trying to decode the 8400 data on iOS. Thanks

pderrenger Apr 12, 2024
Maintainer

@SearchDream hey there! 👋 It looks like you're trying to decode the output for an 8400 grid size. If you're working with YOLO on iOS and dealing with the output directly, the grid size typically corresponds to the division of your image size by the stride of the specific layer of the model.

For a YOLOv8 model, the output dimensions often correlate with how the image is divided across the network's architecture. Without knowing the exact model configuration, it's a bit tricky to give a precise answer, but here’s a general idea:

If your model has an input image size (imgsz) of 640x640 and you're seeing an output of 8400, this might imply a grid size resulting from the model's division of the input image. For example, 640 / stride for each layer, where the stride could be values like 8, 16, or 32 depending on the architecture details.

Decoding this data involves reshaping and interpreting it according to the specifics of your model's output layer. In general, you'd convert these activations into bounding box coordinates, class probabilities, and confidence scores.

If you’re working with a .mc (multi-class probabilities?) and .p (predictions?) format, you might be looking at the structured output that needs to be parsed into a human-readable format, usually involving steps like applying a sigmoid function to the outputs to get probabilities, applying non-maximum suppression to refine bounding boxes, etc.

Here's a pseudo-code that might help:

// Assuming mcArray and pArray are your model outputs
let mcArray = fetchMCArray() // 1x32x8400
let pArray = fetchPArray() // 1x32x160x160
// Decode these based on your model specifics!

For a practical implementation, reviewing the model’s specific documentation or the output layer's configuration would be crucial. Unfortunately, FastSAM's documentation focuses more on segmentation tasks and might not directly align with decoding bounding boxes or class predictions from a detection model like YOLO.

I hope this gives a good starting point! If you've got more specific info about the model or need further assistance, feel free to share! Happy coding! 😊

SearchDream Apr 13, 2024

@pderrenger Thanks for your reply, base on the code I read, there is only one class for FastSAM("# set to 1 class since SAM has no class predictions"). and base on the mask image, I guess there gird size is 25*25 for each image.

daiyangyang945 Apr 13, 2024

Hello, I'm a beginner in machine learning. I converted yolov8-seg.pt to CoreML format according to the documentation. When I use it on iOS, it returns two arrays, 1053 and p. I want to know how to convert them into a mask.

pderrenger Apr 13, 2024
Maintainer

Hi there! 😊 Welcome to the fascinating world of machine learning. So you've converted the yolov8-seg.pt model to CoreML and got two arrays named 1053 and p. To convert them into a mask for image segmentation on iOS, it's all about interpreting these outputs correctly.

The array 1053 might represent the segmentation mask, where each pixel's value corresponds to a different segment or object in the image. The p array could be the probabilities associated with these segments, indicating the model's confidence.

Here's a simplified way to approach this:

Thresholding p: You might need to apply a threshold to the probabilities in p to filter out low-confidence predictions.
Creating the Mask: Use array 1053 directly as your segmentation mask. This array likely holds pixel-wise class or segment identifiers. You can map these identifiers to colors to visualize the segmentation.

In Swift, assuming you've accessed these arrays from the model's output, your code snippet to visualize or use the mask might look somewhat like this:

if let mask = output["1053"] as? MLMultiArray, let probs = output["p"] as? MLMultiArray {
    // Process 'mask' and 'probs' to create a segmentation mask image
    // This part depends on how you wish to apply thresholding and map segments to colors
}

Remember, the exact implementation details will depend on your dataset and how you've trained your model. Feel free to play around with different thresholding techniques to get the best visual results for your segmentation tasks.

Hope this helps, and happy coding! 🚀

Muhammad-Kaleem-Ullah · 2023-08-01T13:00:45Z

Muhammad-Kaleem-Ullah
Aug 1, 2023 — with giscus

Unable to use device=[0, 1, 2, 3] for using GPU resources.

1 reply

pderrenger Feb 6, 2024
Maintainer

@Muhammad-Kaleem-Ullah it seems you're trying to utilize multiple GPUs for your task. To use multiple GPUs in YOLOv8, you should ensure that your system has NVIDIA's Data Parallel or Distributed Data Parallel (DDP) configured correctly. Here's a concise guide to help you:

Make sure you have multiple GPUs available on your system with proper CUDA support.
Verify that PyTorch is recognizing all your GPUs by running torch.cuda.device_count().
When specifying the device in YOLOv8, you can use a list of device indices like device=[0, 1, 2, 3] to distribute the workload across the GPUs.
Ensure that your dataset and batch size are large enough to benefit from multiple GPUs.
If you encounter any issues, check the CUDA version compatibility with PyTorch and ensure that all GPUs are properly installed and configured.

If you continue to face problems, please provide more details about the error message or behavior you're experiencing, and we'll be happy to assist further. 😊👨‍💻

ijunfly · 2023-08-08T07:11:51Z

ijunfly
Aug 8, 2023 — with giscus

Hello, I am a beginner in machine learning models. After converting the FastSAM.pt model to CoreML format, I used it in the Xcode project. The input parameter is an image of [1 * 3 * 1024 * 1024], and the output parameter yields two multidimensional arrays of [1 * 37 * 21504] and [1 * 3 * 256 * 256].
I don't know how to continue processing this data now.

1 reply

pderrenger Feb 6, 2024
Maintainer

@ijunfly hello! It's great to see you're experimenting with FastSAM and integrating it into your Xcode project. After converting the model to CoreML format and running inference, you're left with two outputs: one likely representing the class predictions and the other the segmentation masks.

Here's a brief overview of how you might process these outputs:

Class Predictions [1 * 37 * 21504]: This output contains the probability scores for each class at each location on the grid. You'll need to apply a threshold to determine whether a class is present at a location. Typically, a threshold of 0.5 is used, but you can adjust it based on your needs.
Segmentation Masks [1 * 3 * 256 * 256]: This output is a set of masks that segment the image into different regions. Each mask corresponds to a detected object in the image. You can overlay these masks on the original image to visualize the segmentation.

To process these outputs in your application, you might follow these steps:

Thresholding: Apply a threshold to the class predictions to get a binary mask indicating the presence of a class.
Mask Processing: Resize the masks to the original image size if necessary.
Visualization: Overlay the masks on the original image to see the segmented regions.

For more detailed guidance, you can refer to the documentation on post-processing steps for segmentation models. Since you're working with CoreML, you'll also want to look into the CoreML framework's documentation for handling multi-dimensional outputs and integrating them into your app.

Keep experimenting and don't hesitate to reach out if you have more questions! 🚀🧠

med-tim · 2024-03-18T23:04:38Z

med-tim
Mar 18, 2024

I am using FastSAM (model file: FastSAM-x.pt) for object segmentation in a project where the input images vary in size. FastSAM attempts to adjust image sizes to be multiples of its maximum stride (32) but occasionally fails, leading to errors.

Here's the typical warning and adjustment process:

WARNING ⚠️ imgsz=[453] must be multiple of max stride 32, updating to [480]

The model automatically updates the image size and usually continues without issues. However, when the original image size is close to a multiple of 32 but not exactly a multiple, FastSAM throws a runtime error after processing hundreds of images successfully. For example:

Image Shape before SAM: (374, 144, 3)
WARNING ⚠️ imgsz=[374] must be multiple of max stride 32, updating to [384]
RuntimeError: expand(torch.FloatTensor{[6]}, size=[]): the number of sizes provided (0) must be greater or equal to the number of dimensions in the tensor (1)

Interestingly, manually resizing the image to the suggested dimensions (e.g., 384 in this case) before passing it to the model does not solve the issue, and the error persists.
Has anyone encountered a similar problem or has a solution to this issue? Any insights or suggestions would be greatly appreciated.

3 replies

pderrenger Mar 19, 2024
Maintainer

Hey there! 👋 It looks like you're encountering an issue with FastSAM when dealing with images that nearly align with the max stride but not exactly. This scenario appears to cause a rare runtime error despite the image being resized correctly.

To handle this, let's ensure the image size is factored correctly before passing it to the model. A manual adjustment to the next multiple of 32 might help. Here’s a quick way in Python to ensure your image dimensions are compatible:

# Ensure dimensions are multiples of 32
new_width = (width + 31) // 32 * 32
new_height = (height + 31) // 32 * 32

# Resize your image accordingly before passing to FastSAM
resized_image = cv2.resize(your_image, (new_width, new_height))

Inject this preprocessing step right before you feed the image to FastSAM. This manual adjustment ensures the runtime understands the dimensions adjustments adequately, hopefully bypassing the error you encountered.

If the issue persists even with this workaround, it could hint at deeper intricacies within FastSAM's handling of certain image sizes. In that case, sharing your finding as a detailed issue on the official GitHub repo could help the devs pinpoint and resolve this anomaly. 🛠️

Wishing you a smooth continuation with your project! Let me know how it goes or if there’s anything else popping up.

med-tim Mar 20, 2024

Hi! Thanks for the code snippet for manual adjustment. I implemented it, and the warnings disappeared. I was able to process an initial set of images from a video, confirming the reshaping with these print statements:

Cropped Image Shape before SAM: (455, 187, 3)
Cropped Image Shape after reshape to go into model: (480, 192, 3)

The images were indeed updated and passed into the model without issues. However, after processing about 20 frames, I encountered the same error again, despite the reshaping. This time the error was triggered far earlier than previously when I was having the model itself auto resize the images. This occurred even for a cropped image size (455, 187, 3) that had previously worked after reshaping. The error message was:

RuntimeError: expand(torch.FloatTensor{[2]}, size=[]): the number of sizes provided (0) must be greater or equal to the number of dimensions in the tensor (1)

This error seems to arise in multiple different instances from frames cropped from different videos. Here’s how I’m invoking the model:

def runSAM(new_frame, cropped_image, x1, y1, x2, y2):
    frame = new_frame
    source = cropped_image
    width = source.shape[1]
    height = source.shape[0]

    new_width = (width + 31) // 32 * 32
    new_height = (height + 31) // 32 * 32

    source = cv2.resize(source, (new_width, new_height))
    print('Cropped Image Shape after reshape to go into model:', source.shape)

    model = FastSAM('FastSAM-x.pt')  

    # Run inference on the source image
    everything_results = model(source, device='cpu', retina_masks=True, imgsz=source.shape[0], conf=0.1, iou=0.99)

Do you have any further thoughts on what might be causing this issue, especially given it seems to occur under conditions that previously succeeded? Thank you again for your help!

pderrenger Mar 20, 2024
Maintainer

Hi there! 🙌 It sounds like you've precisely implemented the image resizing approach, but it's puzzling that you're encountering the runtime error even after ensuring image dimensions align with the model's requirements. Given that reshaping the images adheres to FastSAM's expected input dimensions, the continuing issue raises some questions.

This error could be tied to the internal workings of the model or the way data is being handled post-resizing. A few thoughts on possible causes or areas to investigate:

Data Type Consistency: Ensure the resized image's data type aligns with what FastSAM expects. Sometimes discrepancies in data types can lead to unexpected behavior. Ensure source is correctly formatted as an array expected by FastSAM.
Batch Dimension: The error hints at a potential mismatch in dimensions, especially considering the tensor size error. FastSAM, like many deep learning models, might expect a batch dimension. Try adding an extra dimension to your input:
```
source = source[np.newaxis, ...]
```
This modification simulates a batch of images, even when you're processing only one image.
Examine Model Input: Directly review FastSAM's expected input dimensions and ensure your preprocessed inputs precisely match. It could be that an additional expectation isn't being met.
Model Reload: It seems the model is reloaded with every function call in your runSAM function. Verify if reloading the model might disrupt internal states or expected dimensions. Ideally, load the model once and pass it to your function if necessary rather than reloading it each time.

If these suggestions don't clear the issue, it might be helpful to delve deeper into the FastSAM codebase or reach out to the maintainers with a detailed error report. Sometimes, nuances in model architecture or preprocessing steps might not be entirely clear without a deep dive.

Thanks for persevering with this, and I hope one of these insights provides a pathway to a solution! Keep us posted on your progress. 👀

ysig · 2024-11-20T11:09:45Z

ysig
Nov 20, 2024

Given that fast-sam is fast a user may want to run it during training for finetuning on a subtask.
To that end it would be nice to provide code that does the following:

x = process_image_fsam('image.png')
# process_image_fsam: currently absent
x = model(x)
# x.masks: mask of the segmentation - exists.
# x.features: average feature of the segmentation - not clear how to get, but probably it exists because it's used to compute cosine when using text.

Thank you,

1 reply

glenn-jocher Nov 20, 2024
Maintainer

@ysig thank you for your suggestion. While FastSAM is designed for efficient and fast segmentation tasks, extracting average features of segmentation is not directly supported in the current implementation. You may explore extending the model for such capabilities by integrating custom feature extraction methods. For more details on using FastSAM, please refer to the FastSAM documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

FastSAM (Fast Segment Anything Model) - Ultralytics YOLOv8 Docs #3417

{{title}}

Replies: 6 comments 15 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

FastSAM (Fast Segment Anything Model) - Ultralytics YOLOv8 Docs #3417

giscus[bot] bot Jun 27, 2023

FastSAM (Fast Segment Anything Model) - Ultralytics YOLOv8 Docs

Replies: 6 comments · 15 replies

Atomadeus Jul 6, 2023 — with giscus

romanvelichkin Nov 15, 2023 — with giscus

pderrenger Feb 6, 2024 Maintainer

litfish Jul 24, 2023 — with giscus

pderrenger Apr 12, 2024 Maintainer

pderrenger Apr 13, 2024 Maintainer

Muhammad-Kaleem-Ullah Aug 1, 2023 — with giscus

pderrenger Feb 6, 2024 Maintainer

ijunfly Aug 8, 2023 — with giscus

pderrenger Feb 6, 2024 Maintainer

pderrenger Mar 19, 2024 Maintainer

pderrenger Mar 20, 2024 Maintainer

glenn-jocher Nov 20, 2024 Maintainer

giscus[bot]
bot Jun 27, 2023

Replies: 6 comments 15 replies

Atomadeus
Jul 6, 2023 — with giscus

pderrenger Feb 6, 2024
Maintainer

litfish
Jul 24, 2023 — with giscus

pderrenger Apr 12, 2024
Maintainer

pderrenger Apr 13, 2024
Maintainer

Muhammad-Kaleem-Ullah
Aug 1, 2023 — with giscus

pderrenger Feb 6, 2024
Maintainer

ijunfly
Aug 8, 2023 — with giscus

pderrenger Feb 6, 2024
Maintainer

pderrenger Mar 19, 2024
Maintainer

pderrenger Mar 20, 2024
Maintainer

glenn-jocher Nov 20, 2024
Maintainer