robotic_sdk.ml_models.grounded_sam_2.utils package

Submodules

robotic_sdk.ml_models.grounded_sam_2.utils.build_sam module

robotic_sdk.ml_models.grounded_sam_2.utils.mask_dictionary_model module

class robotic_sdk.ml_models.grounded_sam_2.utils.mask_dictionary_model.MaskDictionaryModel(mask_name: str = '', mask_height: int = 1080, mask_width: int = 1920, promote_type: str = 'mask', labels: dict = <factory>)

Bases: object

__init__(mask_name: str = '', mask_height: int = 1080, mask_width: int = 1920, promote_type: str = 'mask', labels: dict = <factory>) None
add_new_frame_annotation(mask_list, box_list, label_list, background_value=0)
static calculate_iou(mask1, mask2)
from_json(json_file)
get_target_class_name(instance_id)
get_target_logit(instance_id)
labels: dict
mask_height: int = 1080
mask_name: str = ''
mask_width: int = 1920
promote_type: str = 'mask'
save_empty_mask_and_json(mask_data_dir, json_data_dir, image_name_list=None)
to_dict()
to_json(json_file)
update_masks(tracking_annotation_dict, iou_threshold=0.8, objects_count=0)
class robotic_sdk.ml_models.grounded_sam_2.utils.mask_dictionary_model.ObjectInfo(instance_id: int = 0, mask: <built-in function any> = None, class_name: str = '', x1: int = 0, y1: int = 0, x2: int = 0, y2: int = 0, logit: float = 0.0)

Bases: object

__init__(instance_id: int = 0, mask: any | None = None, class_name: str = '', x1: int = 0, y1: int = 0, x2: int = 0, y2: int = 0, logit: float = 0.0) None
class_name: str = ''
get_id()
get_mask()
instance_id: int = 0
logit: float = 0.0
mask: any = None
to_dict()
update_box()
x1: int = 0
x2: int = 0
y1: int = 0
y2: int = 0

robotic_sdk.ml_models.grounded_sam_2.utils.misc module

class robotic_sdk.ml_models.grounded_sam_2.utils.misc.AsyncVideoFrameLoader(img_paths, image_size, offload_video_to_cpu, img_mean, img_std, compute_device)

Bases: object

A list of video frames to be load asynchronously without blocking session start.

__init__(img_paths, image_size, offload_video_to_cpu, img_mean, img_std, compute_device)
robotic_sdk.ml_models.grounded_sam_2.utils.misc.concat_points(old_point_inputs, new_points, new_labels)

Add new points and labels to previous point inputs (add at the end).

robotic_sdk.ml_models.grounded_sam_2.utils.misc.fill_holes_in_mask_scores(mask, max_area)

A post processor to fill small holes in mask scores with area under max_area.

robotic_sdk.ml_models.grounded_sam_2.utils.misc.get_connected_components(mask)

Get the connected components (8-connectivity) of binary masks of shape (N, 1, H, W).

Inputs: - mask: A binary mask tensor of shape (N, 1, H, W), where 1 is foreground and 0 is

background.

Outputs: - labels: A tensor of shape (N, 1, H, W) containing the connected component labels

for foreground pixels and 0 for background pixels.

  • counts: A tensor of shape (N, 1, H, W) containing the area of the connected

    components for foreground pixels and 0 for background pixels.

robotic_sdk.ml_models.grounded_sam_2.utils.misc.get_sdpa_settings()
robotic_sdk.ml_models.grounded_sam_2.utils.misc.load_video_frames(video_path, image_size, offload_video_to_cpu, img_mean=(0.485, 0.456, 0.406), img_std=(0.229, 0.224, 0.225), async_loading_frames=False, compute_device=device(type='cuda'))

Load the video frames from video_path. The frames are resized to image_size as in the model and are loaded to GPU if offload_video_to_cpu=False. This is used by the demo.

robotic_sdk.ml_models.grounded_sam_2.utils.misc.load_video_frames_from_jpg_images(video_path, image_size, offload_video_to_cpu, img_mean=(0.485, 0.456, 0.406), img_std=(0.229, 0.224, 0.225), async_loading_frames=False, compute_device=device(type='cuda'))

Load the video frames from a directory of JPEG files (“<frame_index>.jpg” format).

The frames are resized to image_size x image_size and are loaded to GPU if offload_video_to_cpu is False and to CPU if offload_video_to_cpu is True.

You can load a frame asynchronously by setting async_loading_frames to True.

robotic_sdk.ml_models.grounded_sam_2.utils.misc.load_video_frames_from_video_file(video_path, image_size, offload_video_to_cpu, img_mean=(0.485, 0.456, 0.406), img_std=(0.229, 0.224, 0.225), compute_device=device(type='cuda'))

Load the video frames from a video file.

robotic_sdk.ml_models.grounded_sam_2.utils.misc.mask_to_box(masks: Tensor)

compute bounding box given an input mask

Inputs: - masks: [B, 1, H, W] masks, dtype=torch.Tensor

Returns: - box_coords: [B, 1, 4], contains (x, y) coordinates of top left and bottom right box corners, dtype=torch.Tensor

robotic_sdk.ml_models.grounded_sam_2.utils.misc.process_stream_frame(img_array: ndarray, image_size: int, img_mean: Tuple[float, float, float] = (0.485, 0.456, 0.406), img_std: Tuple[float, float, float] = (0.229, 0.224, 0.225), offload_to_cpu: bool = False, compute_device: device = device(type='cuda'))

Convert a raw image array (H,W,3 or 3,H,W) into a model‑ready tensor. Steps —– 1. Resize the shorter side to image_size, keeping aspect ratio,

then center‑crop/pad to image_size × image_size.

  1. Change layout to [3, H, W] and cast to float32 in [0,1].

  2. Normalise with ImageNet statistics.

4. Optionally move to compute_device. :returns: * img_tensor (torch.FloatTensor # shape [3, image_size, image_size])

  • orig_h (int)

  • orig_w (int)

robotic_sdk.ml_models.grounded_sam_2.utils.sam2_video_predictor module

Module contents