robotic_sdk.ml_models.grounded_sam_2.utils package
Submodules
robotic_sdk.ml_models.grounded_sam_2.utils.build_sam module
robotic_sdk.ml_models.grounded_sam_2.utils.mask_dictionary_model module
- class robotic_sdk.ml_models.grounded_sam_2.utils.mask_dictionary_model.MaskDictionaryModel(mask_name: str = '', mask_height: int = 1080, mask_width: int = 1920, promote_type: str = 'mask', labels: dict = <factory>)
Bases:
object- __init__(mask_name: str = '', mask_height: int = 1080, mask_width: int = 1920, promote_type: str = 'mask', labels: dict = <factory>) None
- add_new_frame_annotation(mask_list, box_list, label_list, background_value=0)
- static calculate_iou(mask1, mask2)
- from_json(json_file)
- get_target_class_name(instance_id)
- get_target_logit(instance_id)
- labels: dict
- mask_height: int = 1080
- mask_name: str = ''
- mask_width: int = 1920
- promote_type: str = 'mask'
- save_empty_mask_and_json(mask_data_dir, json_data_dir, image_name_list=None)
- to_dict()
- to_json(json_file)
- update_masks(tracking_annotation_dict, iou_threshold=0.8, objects_count=0)
- class robotic_sdk.ml_models.grounded_sam_2.utils.mask_dictionary_model.ObjectInfo(instance_id: int = 0, mask: <built-in function any> = None, class_name: str = '', x1: int = 0, y1: int = 0, x2: int = 0, y2: int = 0, logit: float = 0.0)
Bases:
object- __init__(instance_id: int = 0, mask: any | None = None, class_name: str = '', x1: int = 0, y1: int = 0, x2: int = 0, y2: int = 0, logit: float = 0.0) None
- class_name: str = ''
- get_id()
- get_mask()
- instance_id: int = 0
- logit: float = 0.0
- mask: any = None
- to_dict()
- update_box()
- x1: int = 0
- x2: int = 0
- y1: int = 0
- y2: int = 0
robotic_sdk.ml_models.grounded_sam_2.utils.misc module
- class robotic_sdk.ml_models.grounded_sam_2.utils.misc.AsyncVideoFrameLoader(img_paths, image_size, offload_video_to_cpu, img_mean, img_std, compute_device)
Bases:
objectA list of video frames to be load asynchronously without blocking session start.
- __init__(img_paths, image_size, offload_video_to_cpu, img_mean, img_std, compute_device)
- robotic_sdk.ml_models.grounded_sam_2.utils.misc.concat_points(old_point_inputs, new_points, new_labels)
Add new points and labels to previous point inputs (add at the end).
- robotic_sdk.ml_models.grounded_sam_2.utils.misc.fill_holes_in_mask_scores(mask, max_area)
A post processor to fill small holes in mask scores with area under max_area.
- robotic_sdk.ml_models.grounded_sam_2.utils.misc.get_connected_components(mask)
Get the connected components (8-connectivity) of binary masks of shape (N, 1, H, W).
Inputs: - mask: A binary mask tensor of shape (N, 1, H, W), where 1 is foreground and 0 is
background.
Outputs: - labels: A tensor of shape (N, 1, H, W) containing the connected component labels
for foreground pixels and 0 for background pixels.
- counts: A tensor of shape (N, 1, H, W) containing the area of the connected
components for foreground pixels and 0 for background pixels.
- robotic_sdk.ml_models.grounded_sam_2.utils.misc.get_sdpa_settings()
- robotic_sdk.ml_models.grounded_sam_2.utils.misc.load_video_frames(video_path, image_size, offload_video_to_cpu, img_mean=(0.485, 0.456, 0.406), img_std=(0.229, 0.224, 0.225), async_loading_frames=False, compute_device=device(type='cuda'))
Load the video frames from video_path. The frames are resized to image_size as in the model and are loaded to GPU if offload_video_to_cpu=False. This is used by the demo.
- robotic_sdk.ml_models.grounded_sam_2.utils.misc.load_video_frames_from_jpg_images(video_path, image_size, offload_video_to_cpu, img_mean=(0.485, 0.456, 0.406), img_std=(0.229, 0.224, 0.225), async_loading_frames=False, compute_device=device(type='cuda'))
Load the video frames from a directory of JPEG files (“<frame_index>.jpg” format).
The frames are resized to image_size x image_size and are loaded to GPU if offload_video_to_cpu is False and to CPU if offload_video_to_cpu is True.
You can load a frame asynchronously by setting async_loading_frames to True.
- robotic_sdk.ml_models.grounded_sam_2.utils.misc.load_video_frames_from_video_file(video_path, image_size, offload_video_to_cpu, img_mean=(0.485, 0.456, 0.406), img_std=(0.229, 0.224, 0.225), compute_device=device(type='cuda'))
Load the video frames from a video file.
- robotic_sdk.ml_models.grounded_sam_2.utils.misc.mask_to_box(masks: Tensor)
compute bounding box given an input mask
Inputs: - masks: [B, 1, H, W] masks, dtype=torch.Tensor
Returns: - box_coords: [B, 1, 4], contains (x, y) coordinates of top left and bottom right box corners, dtype=torch.Tensor
- robotic_sdk.ml_models.grounded_sam_2.utils.misc.process_stream_frame(img_array: ndarray, image_size: int, img_mean: Tuple[float, float, float] = (0.485, 0.456, 0.406), img_std: Tuple[float, float, float] = (0.229, 0.224, 0.225), offload_to_cpu: bool = False, compute_device: device = device(type='cuda'))
Convert a raw image array (H,W,3 or 3,H,W) into a model‑ready tensor. Steps —– 1. Resize the shorter side to image_size, keeping aspect ratio,
then center‑crop/pad to image_size × image_size.
Change layout to [3, H, W] and cast to float32 in [0,1].
Normalise with ImageNet statistics.
4. Optionally move to compute_device. :returns: * img_tensor (torch.FloatTensor # shape [3, image_size, image_size])
orig_h (int)
orig_w (int)