Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌐 [i18n-KO] Translated mask_generation.md to Korean #32257

Merged
merged 11 commits into from
Aug 6, 2024
4 changes: 2 additions & 2 deletions docs/source/ko/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,8 @@
title: (λ²ˆμ—­μ€‘) Image-to-Image
- local: in_translation
title: (λ²ˆμ—­μ€‘) Image Feature Extraction
- local: in_translation
title: (λ²ˆμ—­μ€‘) Mask Generation
- local: tasks/mask_generation
title: 마슀크 생성
- local: in_translation
title: (λ²ˆμ—­μ€‘) Knowledge Distillation for Computer Vision
title: 컴퓨터 λΉ„μ „
Expand Down
228 changes: 228 additions & 0 deletions docs/source/ko/tasks/mask_generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# 마슀크 생성[[mask-generation]]

마슀크 생성(Mask generation)은 이미지에 λŒ€ν•œ 의미 μžˆλŠ” 마슀크λ₯Ό μƒμ„±ν•˜λŠ” μž‘μ—…μž…λ‹ˆλ‹€.
이 μž‘μ—…μ€ [이미지 λΆ„ν• ](semantic_segmentation)κ³Ό 맀우 μœ μ‚¬ν•˜μ§€λ§Œ, λ§Žμ€ 차이점이 μžˆμŠ΅λ‹ˆλ‹€. 이미지 λΆ„ν•  λͺ¨λΈμ€ 라벨이 달린 λ°μ΄ν„°μ…‹μœΌλ‘œ ν•™μŠ΅λ˜λ©°, ν•™μŠ΅ 쀑에 λ³Έ ν΄λž˜μŠ€λ“€λ‘œλ§Œ μ œν•œλ©λ‹ˆλ‹€. 이미지가 주어지면, 이미지 λΆ„ν•  λͺ¨λΈμ€ μ—¬λŸ¬ λ§ˆμŠ€ν¬μ™€ 그에 ν•΄λ‹Ήν•˜λŠ” 클래슀λ₯Ό λ°˜ν™˜ν•©λ‹ˆλ‹€.

반면, 마슀크 생성 λͺ¨λΈμ€ λŒ€λŸ‰μ˜ λ°μ΄ν„°λ‘œ ν•™μŠ΅λ˜λ©° 두 가지 λͺ¨λ“œλ‘œ μž‘λ™ν•©λ‹ˆλ‹€.
- ν”„λ‘¬ν”„νŠΈ λͺ¨λ“œ(Prompting mode): 이 λͺ¨λ“œμ—μ„œλŠ” λͺ¨λΈμ΄ 이미지와 ν”„λ‘¬ν”„νŠΈλ₯Ό μž…λ ₯λ°›μŠ΅λ‹ˆλ‹€. ν”„λ‘¬ν”„νŠΈλŠ” 이미지 λ‚΄ 객체의 2D μ’Œν‘œ(XY μ’Œν‘œ)λ‚˜ 객체λ₯Ό λ‘˜λŸ¬μ‹Ό λ°”μš΄λ”© λ°•μŠ€κ°€ 될 수 μžˆμŠ΅λ‹ˆλ‹€. ν”„λ‘¬ν”„νŠΈ λͺ¨λ“œμ—μ„œλŠ” λͺ¨λΈμ΄ ν”„λ‘¬ν”„νŠΈκ°€ κ°€λ¦¬ν‚€λŠ” 객체의 마슀크만 λ°˜ν™˜ν•©λ‹ˆλ‹€.
- 전체 λΆ„ν•  λͺ¨λ“œ(Segment Everything mode): 이 λͺ¨λ“œμ—μ„œλŠ” 주어진 이미지 λ‚΄μ—μ„œ λͺ¨λ“  마슀크λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€. 이λ₯Ό μœ„ν•΄ κ·Έλ¦¬λ“œ ν˜•νƒœμ˜ 점듀을 μƒμ„±ν•˜κ³  이λ₯Ό 이미지에 μ˜€λ²„λ ˆμ΄ν•˜μ—¬ μΆ”λ‘ ν•©λ‹ˆλ‹€.

마슀크 생성 μž‘μ—…μ€ [전체 λΆ„ν•  λͺ¨λ“œ(Segment Anything Model, SAM)](model_doc/sam)에 μ˜ν•΄ μ§€μ›λ©λ‹ˆλ‹€. SAM은 Vision Transformer 기반 이미지 인코더, ν”„λ‘¬ν”„νŠΈ 인코더, 그리고 μ–‘λ°©ν–₯ 트랜슀포머 마슀크 λ””μ½”λ”λ‘œ κ΅¬μ„±λœ κ°•λ ₯ν•œ λͺ¨λΈμž…λ‹ˆλ‹€. 이미지와 ν”„λ‘¬ν”„νŠΈλŠ” μΈμ½”λ”©λ˜κ³ , λ””μ½”λ”λŠ” μ΄λŸ¬ν•œ μž„λ² λ”©μ„ λ°›μ•„ μœ νš¨ν•œ 마슀크λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€.

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/sam.png" alt="SAM Architecture"/>
</div>

SAM은 λŒ€κ·œλͺ¨ 데이터λ₯Ό λ‹€λ£° 수 μžˆλŠ” κ°•λ ₯ν•œ λΆ„ν•  기반 λͺ¨λΈμž…λ‹ˆλ‹€. 이 λͺ¨λΈμ€ 100만 개의 이미지와 11μ–΅ 개의 마슀크λ₯Ό ν¬ν•¨ν•˜λŠ” [SA-1B](https://ai.meta.com/datasets/segment-anything/) 데이터 μ„ΈνŠΈλ‘œ ν•™μŠ΅λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

이 κ°€μ΄λ“œμ—μ„œλŠ” λ‹€μŒκ³Ό 같은 λ‚΄μš©μ„ 배우게 λ©λ‹ˆλ‹€:
- 배치 μ²˜λ¦¬μ™€ ν•¨κ»˜ 전체 λΆ„ν•  λͺ¨λ“œμ—μ„œ μΆ”λ‘ ν•˜λŠ” 방법
- 포인트 ν”„λ‘¬ν”„νŒ… λͺ¨λ“œμ—μ„œ μΆ”λ‘ ν•˜λŠ” 방법
- λ°•μŠ€ ν”„λ‘¬ν”„νŒ… λͺ¨λ“œμ—μ„œ μΆ”λ‘ ν•˜λŠ” 방법

λ¨Όμ €, `transformers`λ₯Ό μ„€μΉ˜ν•΄ λ΄…μ‹œλ‹€:

```bash
pip install -q transformers
```

## 마슀크 생성 νŒŒμ΄ν”„λΌμΈ[[mask-generation-pipeline]]

마슀크 생성 λͺ¨λΈλ‘œ μΆ”λ‘ ν•˜λŠ” κ°€μž₯ μ‰¬μš΄ 방법은 `mask-generation` νŒŒμ΄ν”„λΌμΈμ„ μ‚¬μš©ν•˜λŠ” κ²ƒμž…λ‹ˆλ‹€.

```python
>>> from transformers import pipeline

>>> checkpoint = "facebook/sam-vit-base"
>>> mask_generator = pipeline(model=checkpoint, task="mask-generation")
```

이미지λ₯Ό μ˜ˆμ‹œλ‘œ λ΄…μ‹œλ‹€.

```python
from PIL import Image
import requests

img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
```

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg" alt="Example Image"/>
</div>

μ „μ²΄μ μœΌλ‘œ λΆ„ν• ν•΄λ΄…μ‹œλ‹€. `points-per-batch`λŠ” 전체 λΆ„ν•  λͺ¨λ“œμ—μ„œ μ λ“€μ˜ 병렬 좔둠을 κ°€λŠ₯ν•˜κ²Œ ν•©λ‹ˆλ‹€. 이λ₯Ό 톡해 μΆ”λ‘  속도가 λΉ¨λΌμ§€μ§€λ§Œ, 더 λ§Žμ€ λ©”λͺ¨λ¦¬λ₯Ό μ†Œλͺ¨ν•˜κ²Œ λ©λ‹ˆλ‹€. λ˜ν•œ, SAM은 이미지가 μ•„λ‹Œ 점듀에 λŒ€ν•΄μ„œλ§Œ 배치 처리λ₯Ό μ§€μ›ν•©λ‹ˆλ‹€. `pred_iou_thresh`λŠ” IoU μ‹ λ’° μž„κ³„κ°’μœΌλ‘œ, 이 μž„κ³„κ°’μ„ μ΄ˆκ³Όν•˜λŠ” 마슀크만 λ°˜ν™˜λ©λ‹ˆλ‹€.

```python
masks = mask_generator(image, points_per_batch=128, pred_iou_thresh=0.88)
```

`masks` λŠ” λ‹€μŒκ³Ό 같이 μƒκ²ΌμŠ΅λ‹ˆλ‹€:

```bash
{'masks': [array([[False, False, False, ..., True, True, True],
[False, False, False, ..., True, True, True],
[False, False, False, ..., True, True, True],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]]),
array([[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
'scores': tensor([0.9972, 0.9917,
...,
}
```

μœ„ λ‚΄μš©μ„ μ•„λž˜μ™€ 같이 μ‹œκ°ν™”ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

```python
import matplotlib.pyplot as plt

plt.imshow(image, cmap='gray')

for i, mask in enumerate(masks["masks"]):
plt.imshow(mask, cmap='viridis', alpha=0.1, vmin=0, vmax=1)

plt.axis('off')
plt.show()
```

μ•„λž˜λŠ” νšŒμƒ‰μ‘° 원본 이미지에 λ‹€μ±„λ‘œμš΄ μƒ‰μƒμ˜ 맡을 겹쳐놓은 λͺ¨μŠ΅μž…λ‹ˆλ‹€. 맀우 인상적인 κ²°κ³Όμž…λ‹ˆλ‹€.

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee_segmented.png" alt="Visualized"/>
</div>

## λͺ¨λΈ μΆ”λ‘ [[model-inference]]

### 포인트 ν”„λ‘¬ν”„νŒ…[[point-prompting]]

νŒŒμ΄ν”„λΌμΈ 없이도 λͺ¨λΈμ„ μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. 이λ₯Ό μœ„ν•΄ λͺ¨λΈκ³Ό ν”„λ‘œμ„Έμ„œλ₯Ό μ΄ˆκΈ°ν™”ν•΄μ•Ό ν•©λ‹ˆλ‹€.

```python
from transformers import SamModel, SamProcessor
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = SamModel.from_pretrained("facebook/sam-vit-base").to(device)
processor = SamProcessor.from_pretrained("facebook/sam-vit-base")
```

포인트 ν”„λ‘¬ν”„νŒ…μ„ ν•˜κΈ° μœ„ν•΄, μž…λ ₯ 포인트λ₯Ό ν”„λ‘œμ„Έμ„œμ— μ „λ‹¬ν•œ λ‹€μŒ, ν”„λ‘œμ„Έμ„œ 좜λ ₯을 λ°›μ•„ λͺ¨λΈμ— μ „λ‹¬ν•˜μ—¬ μΆ”λ‘ ν•©λ‹ˆλ‹€. λͺ¨λΈ 좜λ ₯을 ν›„μ²˜λ¦¬ν•˜λ €λ©΄, 좜λ ₯κ³Ό ν•¨κ»˜ ν”„λ‘œμ„Έμ„œμ˜ 초기 좜λ ₯μ—μ„œ κ°€μ Έμ˜¨ `original_sizes`와 `reshaped_input_sizes`λ₯Ό 전달해야 ν•©λ‹ˆλ‹€. μ™œλƒν•˜λ©΄, ν”„λ‘œμ„Έμ„œκ°€ 이미지 크기λ₯Ό μ‘°μ •ν•˜κ³  좜λ ₯을 μΆ”μ •ν•΄μ•Ό ν•˜κΈ° λ•Œλ¬Έμž…λ‹ˆλ‹€.

```python
input_points = [[[2592, 1728]]] # 벌의 포인트 μœ„μΉ˜

inputs = processor(image, input_points=input_points, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
```

`masks` 좜λ ₯으둜 μ„Έ 가지 마슀크λ₯Ό μ‹œκ°ν™”ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

```python
import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(1, 4, figsize=(15, 5))

axes[0].imshow(image)
axes[0].set_title('Original Image')
mask_list = [masks[0][0][0].numpy(), masks[0][0][1].numpy(), masks[0][0][2].numpy()]

for i, mask in enumerate(mask_list, start=1):
overlayed_image = np.array(image).copy()

overlayed_image[:,:,0] = np.where(mask == 1, 255, overlayed_image[:,:,0])
overlayed_image[:,:,1] = np.where(mask == 1, 0, overlayed_image[:,:,1])
overlayed_image[:,:,2] = np.where(mask == 1, 0, overlayed_image[:,:,2])

axes[i].imshow(overlayed_image)
axes[i].set_title(f'Mask {i}')
for ax in axes:
ax.axis('off')

plt.show()
```

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/masks.png" alt="Visualized"/>
</div>

### λ°•μŠ€ ν”„λ‘¬ν”„νŒ…[[box-prompting]]

λ°•μŠ€ ν”„λ‘¬ν”„νŒ…λ„ 포인트 ν”„λ‘¬ν”„νŒ…κ³Ό μœ μ‚¬ν•œ λ°©μ‹μœΌλ‘œ ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μž…λ ₯ λ°•μŠ€λ₯Ό `[x_min, y_min, x_max, y_max]` ν˜•μ‹μ˜ 리슀트둜 μž‘μ„±ν•˜μ—¬ 이미지와 ν•¨κ»˜ `processor`에 전달할 수 μžˆμŠ΅λ‹ˆλ‹€. ν”„λ‘œμ„Έμ„œ 좜λ ₯을 λ°›μ•„ λͺ¨λΈμ— 직접 μ „λ‹¬ν•œ ν›„, λ‹€μ‹œ 좜λ ₯을 ν›„μ²˜λ¦¬ν•΄μ•Ό ν•©λ‹ˆλ‹€.

```python
# 벌 μ£Όμœ„μ˜ λ°”μš΄λ”© λ°•μŠ€
box = [2350, 1600, 2850, 2100]

inputs = processor(
image,
input_boxes=[[[box]]],
return_tensors="pt"
).to("cuda")

with torch.no_grad():
outputs = model(**inputs)

mask = processor.image_processor.post_process_masks(
outputs.pred_masks.cpu(),
inputs["original_sizes"].cpu(),
inputs["reshaped_input_sizes"].cpu()
)[0][0][0].numpy()
```

이제 μ•„λž˜μ™€ 같이, 벌 μ£Όμœ„μ˜ λ°”μš΄λ”© λ°•μŠ€λ₯Ό μ‹œκ°ν™”ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

```python
import matplotlib.patches as patches

fig, ax = plt.subplots()
ax.imshow(image)

rectangle = patches.Rectangle((2350, 1600), 500, 500, linewidth=2, edgecolor='r', facecolor='none')
ax.add_patch(rectangle)
ax.axis("off")
plt.show()
```

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bbox.png" alt="Visualized Bbox"/>
</div>

μ•„λž˜μ—μ„œ μΆ”λ‘  κ²°κ³Όλ₯Ό 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.

```python
fig, ax = plt.subplots()
ax.imshow(image)
ax.imshow(mask, cmap='viridis', alpha=0.4)

ax.axis("off")
plt.show()
```

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/box_inference.png" alt="Visualized Inference"/>
</div>
Loading