API Documentation¶

fileio¶

class mmcv.fileio.BaseStorageBackend[source]¶

Abstract class of storage backends.

All backends need to implement two apis: get() and get_text(). get() reads the file as a byte stream and get_text() reads the file as texts.

class mmcv.fileio.FileClient(backend='disk', **kwargs)[source]¶

A general file client to access files in different backend.

The client loads a file or text in a specified backend from its path and return it as a binary file. it can also register other backend accessor with a given name and backend class.

backend¶

The storage backend type. Options are “disk”, “ceph”, “memcached” and “lmdb”.

Type:	str

client¶

The backend object.

Type:	`BaseStorageBackend`

classmethod register_backend(name, backend=None, force=False)[source]¶

Register a backend to FileClient.

This method can be used as a normal class method or a decorator.

class NewBackend(BaseStorageBackend):

    def get(self, filepath):
        return filepath

    def get_text(self, filepath):
        return filepath

FileClient.register_backend('new', NewBackend)

or

@FileClient.register_backend('new')
class NewBackend(BaseStorageBackend):

    def get(self, filepath):
        return filepath

    def get_text(self, filepath):
        return filepath

Parameters:	name (str) – The name of the registered backend. backend (class, optional) – The backend class to be registered, which must be a subclass of `BaseStorageBackend`. When this method is used as a decorator, backend is None. Defaults to None. force (bool, optional) – Whether to override the backend if the name has already been registered. Defaults to False.

mmcv.fileio.load(file, file_format=None, **kwargs)[source]¶

Load data from json/yaml/pickle files.

This method provides a unified api for loading data from serialized files.

Parameters:	file (str or `Path` or file-like object) – Filename or a file-like object. file_format (str, optional) – If not specified, the file format will be inferred from the file extension, otherwise use the specified one. Currently supported formats include “json”, “yaml/yml” and “pickle/pkl”.
Returns:	The content from the file.

mmcv.fileio.dump(obj, file=None, file_format=None, **kwargs)[source]¶

Dump data to json/yaml/pickle strings or files.

This method provides a unified api for dumping data as strings or to files, and also supports custom arguments for each file format.

Parameters:	obj (any) – The python object to be dumped. file (str or `Path` or file-like object, optional) – If not specified, then the object is dump to a str, otherwise to a file specified by the filename or file-like object. file_format (str, optional) – Same as `load()`.
Returns:	True for success, False otherwise.
Return type:	bool

mmcv.fileio.list_from_file(filename, prefix='', offset=0, max_num=0)[source]¶

Load a text file and parse the content as a list of strings.

Parameters:	filename (str) – Filename. prefix (str) – The prefix to be inserted to the begining of each item. offset (int) – The offset of lines. max_num (int) – The maximum number of lines to be read, zeros and negatives mean no limitation.
Returns:	A list of strings.
Return type:	list[str]

mmcv.fileio.dict_from_file(filename, key_type=<class 'str'>)[source]¶

Load a text file and parse the content as a dict.

Each line of the text file will be two or more columns splited by whitespaces or tabs. The first column will be parsed as dict keys, and the following columns will be parsed as dict values.

Parameters:	filename (str) – Filename. key_type (type) – Type of the dict’s keys. str is user by default and type conversion will be performed if specified.
Returns:	The parsed contents.
Return type:	dict

image¶

mmcv.image.bgr2gray(img, keepdim=False)[source]¶

Convert a BGR image to grayscale image.

Parameters:	img (ndarray) – The input image. keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.
Returns:	The converted grayscale image.
Return type:	ndarray

mmcv.image.bgr2hls(img)¶

Convert a BGR image to HLS: image.

Parameters:	img (ndarray or str) – The input image.
Returns:	The converted HLS image.
Return type:	ndarray

mmcv.image.bgr2hsv(img)¶

Convert a BGR image to HSV: image.

Parameters:	img (ndarray or str) – The input image.
Returns:	The converted HSV image.
Return type:	ndarray

mmcv.image.bgr2rgb(img)¶

Convert a BGR image to RGB: image.

Parameters:	img (ndarray or str) – The input image.
Returns:	The converted RGB image.
Return type:	ndarray

mmcv.image.gray2bgr(img)[source]¶

Convert a grayscale image to BGR image.

Parameters:	img (ndarray) – The input image.
Returns:	The converted BGR image.
Return type:	ndarray

mmcv.image.gray2rgb(img)[source]¶

Convert a grayscale image to RGB image.

Parameters:	img (ndarray) – The input image.
Returns:	The converted RGB image.
Return type:	ndarray

mmcv.image.hls2bgr(img)¶

Convert a HLS image to BGR: image.

Parameters:	img (ndarray or str) – The input image.
Returns:	The converted BGR image.
Return type:	ndarray

mmcv.image.hsv2bgr(img)¶

Convert a HSV image to BGR: image.

Parameters:	img (ndarray or str) – The input image.
Returns:	The converted BGR image.
Return type:	ndarray

mmcv.image.imconvert(img, src, dst)[source]¶

Convert an image from the src colorspace to dst colorspace.

Parameters:	img (ndarray) – The input image. src (str) – The source colorspace, e.g., ‘rgb’, ‘hsv’. dst (str) – The destination colorspace, e.g., ‘rgb’, ‘hsv’.
Returns:	The converted image.
Return type:	ndarray

mmcv.image.rgb2bgr(img)¶

Convert a RGB image to BGR: image.

Parameters:	img (ndarray or str) – The input image.
Returns:	The converted BGR image.
Return type:	ndarray

mmcv.image.rgb2gray(img, keepdim=False)[source]¶

Convert a RGB image to grayscale image.

Parameters:	img (ndarray) – The input image. keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.
Returns:	The converted grayscale image.
Return type:	ndarray

mmcv.image.imrescale(img, scale, return_scale=False, interpolation='bilinear')[source]¶

Resize image while keeping the aspect ratio.

Parameters:	img (ndarray) – The input image. scale (float \| tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale. return_scale (bool) – Whether to return the scaling factor besides the rescaled image. interpolation (str) – Same as `resize()`.
Returns:	The rescaled image.
Return type:	ndarray

mmcv.image.imresize(img, size, return_scale=False, interpolation='bilinear', out=None)[source]¶

Resize image to a given size.

Parameters:

img (ndarray) – The input image.
size (tuple[int]) – Target size (w, h).
return_scale (bool) – Whether to return w_scale and h_scale.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos”.
out (ndarray) – The output destination.

Returns:

(resized_img, w_scale, h_scale) or: resized_img.

Return type:

tuple | ndarray

mmcv.image.imresize_like(img, dst_img, return_scale=False, interpolation='bilinear')[source]¶

Resize image to the same size of a given image.

Parameters:

img (ndarray) – The input image.
dst_img (ndarray) – The target image.
return_scale (bool) – Whether to return w_scale and h_scale.
interpolation (str) – Same as resize().

Returns:

(resized_img, w_scale, h_scale) or: resized_img.

Return type:

tuple or ndarray

mmcv.image.rescale_size(old_size, scale, return_scale=False)[source]¶

Calculate the new size to be rescaled to.

Parameters:	old_size (tuple[int]) – The old size (w, h) of image. scale (float \| tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale. return_scale (bool) – Whether to return the scaling factor besides the rescaled image size.
Returns:	The new rescaled image size.
Return type:	tuple[int]

mmcv.image.imcrop(img, bboxes, scale=1.0, pad_fill=None)[source]¶

Crop image patches.

3 steps: scale the bboxes -> clip bboxes -> crop and pad.

Parameters:	img (ndarray) – Image to be cropped. bboxes (ndarray) – Shape (k, 4) or (4, ), location of cropped bboxes. scale (float, optional) – Scale ratio of bboxes, the default value 1.0 means no padding. pad_fill (Number \| list[Number]) – Value to be filled for padding. Default: None, which means no padding.
Returns:	The cropped image patches.
Return type:	list[ndarray] \| ndarray

mmcv.image.imflip(img, direction='horizontal')[source]¶

Flip an image horizontally or vertically.

Parameters:	img (ndarray) – Image to be flipped. direction (str) – The flip direction, either “horizontal” or “vertical”.
Returns:	The flipped image.
Return type:	ndarray

mmcv.image.imflip_(img, direction='horizontal')[source]¶

Inplace flip an image horizontally or vertically.

Parameters:	img (ndarray) – Image to be flipped. direction (str) – The flip direction, either “horizontal” or “vertical”.
Returns:	The flipped image (inplace).
Return type:	ndarray

mmcv.image.impad(img, *, shape=None, padding=None, pad_val=0, padding_mode='constant')[source]¶

Pad the given image to a certain shape or pad on all sides with specified padding mode and padding value.

Parameters:	img (ndarray) – Image to be padded. shape (tuple[int]) – Expected padding shape (h, w). Default: None. padding (int or tuple[int]) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively. Default: None. Note that shape and padding can not be both set. pad_val (Number \| Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default: 0. padding_mode (str) – Type of padding. Should be: constant, edge, reflect or symmetric. Default: constant. constant: pads with a constant value, this value is specified with pad_val. edge: pads with the last value at the edge of the image. reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]. symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]
Returns:	The padded image.
Return type:	ndarray

mmcv.image.impad_to_multiple(img, divisor, pad_val=0)[source]¶

Pad an image to ensure each edge to be multiple to some number.

Parameters:	img (ndarray) – Image to be padded. divisor (int) – Padded image edges will be multiple to divisor. pad_val (Number \| Sequence[Number]) – Same as `impad()`.
Returns:	The padded image.
Return type:	ndarray

mmcv.image.imrotate(img, angle, center=None, scale=1.0, border_value=0, auto_bound=False)[source]¶

Rotate an image.

Parameters:	img (ndarray) – Image to be rotated. angle (float) – Rotation angle in degrees, positive values mean clockwise rotation. center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. scale (float) – Isotropic scale factor. border_value (int) – Border value. auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image.
Returns:	The rotated image.
Return type:	ndarray

mmcv.image.imfrombytes(content, flag='color', channel_order='bgr', backend=None)[source]¶

Read an image from bytes.

Parameters:	content (bytes) – Image bytes got from files or other streams. flag (str) – Same as `imread()`. backend (str\|None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, None. If backend is None, the global imread_backend specified by `mmcv.use_backend()` will be used. Default: None.
Returns:	Loaded image array.
Return type:	ndarray

mmcv.image.imread(img_or_path, flag='color', channel_order='bgr', backend=None)[source]¶

Read an image.

Parameters:	img_or_path (ndarray or str or Path) – Either a numpy array or str or pathlib.Path. If it is a numpy array (loaded image), then it will be returned as is. flag (str) – Flags specifying the color type of a loaded image, candidates are color, grayscale and unchanged. Note that the turbojpeg backened does not support unchanged. channel_order (str) – Order of channel, candidates are bgr and rgb. backend (str\|None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, None. If backend is None, the global imread_backend specified by `mmcv.use_backend()` will be used. Default: None.
Returns:	Loaded image array.
Return type:	ndarray

mmcv.image.imwrite(img, file_path, params=None, auto_mkdir=True)[source]¶

Write image to file.

Parameters:	img (ndarray) – Image array to be written. file_path (str) – Image file path. params (None or list) – Same as opencv’s `imwrite()` interface. auto_mkdir (bool) – If the parent folder of file_path does not exist, whether to create it automatically.
Returns:	Successful or not.
Return type:	bool

mmcv.image.use_backend(backend)[source]¶

Select a backend for image decoding.

Parameters:	backend (str) – The image decoding backend type. Options are cv2, turbojpeg (see https (pillow,) – //github.com/lilohuang/PyTurboJPEG). is faster but it only supports .jpeg file format. (turbojpeg) –

mmcv.image.imnormalize(img, mean, std, to_rgb=True)[source]¶

Normalize an image with mean and std.

Parameters:	img (ndarray) – Image to be normalized. mean (ndarray) – The mean to be used for normalize. std (ndarray) – The std to be used for normalize. to_rgb (bool) – Whether to convert to rgb.
Returns:	The normalized image.
Return type:	ndarray

mmcv.image.imnormalize_(img, mean, std, to_rgb=True)[source]¶

Inplace normalize an image with mean and std.

Parameters:	img (ndarray) – Image to be normalized. mean (ndarray) – The mean to be used for normalize. std (ndarray) – The std to be used for normalize. to_rgb (bool) – Whether to convert to rgb.
Returns:	The normalized image.
Return type:	ndarray

mmcv.image.iminvert(img)[source]¶

Invert (negate) an image.

Parameters:	img (ndarray) – Image to be inverted.
Returns:	The inverted image.
Return type:	ndarray

mmcv.image.posterize(img, bits)[source]¶

Posterize an image (reduce the number of bits for each color channel)

Parameters:	img (ndarray) – Image to be posterized. bits (int) – Number of bits (1 to 8) to use for posterizing.
Returns:	The posterized image.
Return type:	ndarray

mmcv.image.solarize(img, thr=128)[source]¶

Solarize an image (invert all pixel values above a threshold)

Parameters:	img (ndarray) – Image to be solarized. thr (int) – Threshold for solarizing (0 - 255).
Returns:	The solarized image.
Return type:	ndarray

mmcv.image.rgb2ycbcr(img, y_only=False)[source]¶

Convert a RGB image to YCbCr image.

This function produces the same results as Matlab’s rgb2ycbcr function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: RGB <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters:

img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
y_only (bool) – Whether to only return Y channel. Default: False.

Returns:

The converted YCbCr image. The output image has the same type: and range as input image.

Return type:

ndarray

mmcv.image.bgr2ycbcr(img, y_only=False)[source]¶

Convert a BGR image to YCbCr image.

The bgr version of rgb2ycbcr. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: BGR <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters:

img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
y_only (bool) – Whether to only return Y channel. Default: False.

Returns:

The converted YCbCr image. The output image has the same type: and range as input image.

Return type:

ndarray

mmcv.image.ycbcr2rgb(img)[source]¶

Convert a YCbCr image to RGB image.

This function produces the same results as Matlab’s ycbcr2rgb function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: YCrCb <-> RGB. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters:	img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
Returns:	The converted RGB image. The output image has the same type and range as input image.
Return type:	ndarray

mmcv.image.ycbcr2bgr(img)[source]¶

Convert a YCbCr image to BGR image.

The bgr version of ycbcr2rgb. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: YCrCb <-> BGR. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters:	img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
Returns:	The converted BGR image. The output image has the same type and range as input image.
Return type:	ndarray

mmcv.image.tensor2imgs(tensor, mean=(0, 0, 0), std=(1, 1, 1), to_rgb=True)[source]¶

Convert tensor to 3-channel images.

Parameters:	tensor (torch.Tensor) – Tensor that contains multiple images, shape ( N, C, H, W). mean (tuple[float], optional) – Mean of images. Defaults to (0, 0, 0). std (tuple[float], optional) – Standard deviation of images. Defaults to (1, 1, 1). to_rgb (bool, optional) – Whether the tensor was converted to RGB format in the first place. If so, convert it back to BGR. Defaults to True.
Returns:	A list that contains multiple images.
Return type:	list[np.ndarray]

video¶

class mmcv.video.VideoReader(filename, cache_capacity=10)[source]¶

Video class with similar usage to a list object.

This video warpper class provides convenient apis to access frames. There exists an issue of OpenCV’s VideoCapture class that jumping to a certain frame may be inaccurate. It is fixed in this class by checking the position after jumping each time. Cache is used when decoding videos. So if the same frame is visited for the second time, there is no need to decode again if it is stored in the cache.

Example:

>>> import mmcv
>>> v = mmcv.VideoReader('sample.mp4')
>>> len(v)  # get the total frame number with `len()`
120
>>> for img in v:  # v is iterable
>>>     mmcv.imshow(img)
>>> v[5]  # get the 6th frame

current_frame()[source]¶

Get the current frame (frame that is just visited).

Returns:	If the video is fresh, return None, otherwise return the frame.
Return type:	ndarray or None

cvt2frames(frame_dir, file_start=0, filename_tmpl='{:06d}.jpg', start=0, max_num=0, show_progress=True)[source]¶

Convert a video to frame images.

Parameters:

frame_dir (str) – Output directory to store all the frame images.
file_start (int) – Filenames will start from the specified number.
filename_tmpl (str) – Filename template with the index as the placeholder.
start (int) – The starting frame index.
max_num (int) – Maximum number of frames to be written.
show_progress (bool) – Whether to show a progress bar.

fourcc¶

“Four character code” of the video.

Type:	str

fps¶

FPS of the video.

Type:	float

frame_cnt¶

Total frames of the video.

Type:	int

get_frame(frame_id)[source]¶

Get frame by index.

Parameters:	frame_id (int) – Index of the expected frame, 0-based.
Returns:	Return the frame if successful, otherwise None.
Return type:	ndarray or None

height¶

Height of video frames.

Type:	int

opened¶

Indicate whether the video is opened.

Type:	bool

position¶

Current cursor position, indicating frame decoded.

Type:	int

read()[source]¶

Read the next frame.

If the next frame have been decoded before and in the cache, then return it directly, otherwise decode, cache and return it.

Returns:	Return the frame if successful, otherwise None.
Return type:	ndarray or None

resolution¶

Video resolution (width, height).

Type:	tuple

vcap¶

The raw VideoCapture object.

Type:	`cv2.VideoCapture`

width¶

Width of video frames.

Type:	int

mmcv.video.frames2video(frame_dir, video_file, fps=30, fourcc='XVID', filename_tmpl='{:06d}.jpg', start=0, end=0, show_progress=True)[source]¶

Read the frame images from a directory and join them as a video.

Parameters:

frame_dir (str) – The directory containing video frames.
video_file (str) – Output filename.
fps (float) – FPS of the output video.
fourcc (str) – Fourcc of the output video, this should be compatible with the output file type.
filename_tmpl (str) – Filename template with the index as the variable.
start (int) – Starting frame index.
end (int) – Ending frame index.
show_progress (bool) – Whether to show a progress bar.

mmcv.video.convert_video(in_file, out_file, print_cmd=False, pre_options='', **kwargs)[source]¶

Convert a video with ffmpeg.

This provides a general api to ffmpeg, the executed command is:

`ffmpeg -y <pre_options> -i <in_file> <options> <out_file>`

Options(kwargs) are mapped to ffmpeg commands with the following rules:

key=val: “-key val”
key=True: “-key”
key=False: “”

Parameters:	in_file (str) – Input video filename. out_file (str) – Output video filename. pre_options (str) – Options appears before “-i <in_file>”. print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.resize_video(in_file, out_file, size=None, ratio=None, keep_ar=False, log_level='info', print_cmd=False)[source]¶

Resize a video.

Parameters:

in_file (str) – Input video filename.
out_file (str) – Output video filename.
size (tuple) – Expected size (w, h), eg, (320, 240) or (320, -1).
ratio (tuple or float) – Expected resize ratio, (2, 0.5) means (w*2, h*0.5).
keep_ar (bool) – Whether to keep original aspect ratio.
log_level (str) – Logging level of ffmpeg.
print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.cut_video(in_file, out_file, start=None, end=None, vcodec=None, acodec=None, log_level='info', print_cmd=False)[source]¶

Cut a clip from a video.

Parameters:

in_file (str) – Input video filename.
out_file (str) – Output video filename.
start (None or float) – Start time (in seconds).
end (None or float) – End time (in seconds).
vcodec (None or str) – Output video codec, None for unchanged.
acodec (None or str) – Output audio codec, None for unchanged.
log_level (str) – Logging level of ffmpeg.
print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.concat_video(video_list, out_file, vcodec=None, acodec=None, log_level='info', print_cmd=False)[source]¶

Concatenate multiple videos into a single one.

Parameters:	video_list (list) – A list of video filenames out_file (str) – Output video filename vcodec (None or str) – Output video codec, None for unchanged acodec (None or str) – Output audio codec, None for unchanged log_level (str) – Logging level of ffmpeg. print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.flowread(flow_or_path, quantize=False, concat_axis=0, *args, **kwargs)[source]¶

Read an optical flow map.

Parameters:	flow_or_path (ndarray or str) – A flow map or filepath. quantize (bool) – whether to read quantized pair, if set to True, remaining args will be passed to `dequantize_flow()`. concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.
Returns:	Optical flow represented as a (h, w, 2) numpy array
Return type:	ndarray

mmcv.video.flowwrite(flow, filename, quantize=False, concat_axis=0, *args, **kwargs)[source]¶

Write optical flow to file.

If the flow is not quantized, it will be saved as a .flo file losslessly, otherwise a jpeg image which is lossy but of much smaller size. (dx and dy will be concatenated horizontally into a single image if quantize is True.)

Parameters:	flow (ndarray) – (h, w, 2) array of optical flow. filename (str) – Output filepath. quantize (bool) – Whether to quantize the flow and save it to 2 jpeg images. If set to True, remaining args will be passed to `quantize_flow()`. concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.

mmcv.video.quantize_flow(flow, max_val=0.02, norm=True)[source]¶

Quantize flow to [0, 255].

After this step, the size of flow will be much smaller, and can be dumped as jpeg images.

Parameters:	flow (ndarray) – (h, w, 2) array of optical flow. max_val (float) – Maximum value of flow, values beyond [-max_val, max_val] will be truncated. norm (bool) – Whether to divide flow values by image width/height.
Returns:	Quantized dx and dy.
Return type:	tuple[ndarray]

mmcv.video.dequantize_flow(dx, dy, max_val=0.02, denorm=True)[source]¶

Recover from quantized flow.

Parameters:	dx (ndarray) – Quantized dx. dy (ndarray) – Quantized dy. max_val (float) – Maximum value used when quantizing. denorm (bool) – Whether to multiply flow values with width/height.
Returns:	Dequantized flow.
Return type:	ndarray

mmcv.video.flow_warp(img, flow, filling_value=0, interpolate_mode='nearest')[source]¶

Use flow to warp img.

Parameters:	img (ndarray, float or uint8) – Image to be warped. flow (ndarray, float) – Optical Flow. filling_value (int) – The missing pixels will be set with filling_value. interpolate_mode (str) – bilinear -> Bilinear Interpolation; nearest -> Nearest Neighbor.
Returns:	Warped image with the same shape of img
Return type:	ndarray

arraymisc¶

mmcv.arraymisc.quantize(arr, min_val, max_val, levels, dtype=<class 'numpy.int64'>)[source]¶

Quantize an array of (-inf, inf) to [0, levels-1].

Parameters:	arr (ndarray) – Input array. min_val (scalar) – Minimum value to be clipped. max_val (scalar) – Maximum value to be clipped. levels (int) – Quantization levels. dtype (np.type) – The type of the quantized array.
Returns:	Quantized array.
Return type:	tuple

mmcv.arraymisc.dequantize(arr, min_val, max_val, levels, dtype=<class 'numpy.float64'>)[source]¶

Dequantize an array.

Parameters:	arr (ndarray) – Input array. min_val (scalar) – Minimum value to be clipped. max_val (scalar) – Maximum value to be clipped. levels (int) – Quantization levels. dtype (np.type) – The type of the dequantized array.
Returns:	Dequantized array.
Return type:	tuple

visualization¶

class mmcv.visualization.Color[source]¶

An enum that defines common colors.

Contains red, green, blue, cyan, yellow, magenta, white and black.

mmcv.visualization.color_val(color)[source]¶

Convert various input to color tuples.

Parameters:	color (`Color`/str/tuple/int/ndarray) – Color inputs
Returns:	A tuple of 3 integers indicating BGR channels.
Return type:	tuple[int]

mmcv.visualization.imshow(img, win_name='', wait_time=0)[source]¶

Show an image.

Parameters:	img (str or ndarray) – The image to be displayed. win_name (str) – The window name. wait_time (int) – Value of waitKey param.

mmcv.visualization.imshow_bboxes(img, bboxes, colors='green', top_k=-1, thickness=1, show=True, win_name='', wait_time=0, out_file=None)[source]¶

Draw bboxes on an image.

Parameters:

img (str or ndarray) – The image to be displayed.
bboxes (list or ndarray) – A list of ndarray of shape (k, 4).
colors (list[str or tuple or Color]) – A list of colors.
top_k (int) – Plot the first k bboxes only if set positive.
thickness (int) – Thickness of lines.
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str, optional) – The filename to write the image.

mmcv.visualization.imshow_det_bboxes(img, bboxes, labels, class_names=None, score_thr=0, bbox_color='green', text_color='green', thickness=1, font_scale=0.5, show=True, win_name='', wait_time=0, out_file=None)[source]¶

Draw bboxes and class labels (with scores) on an image.

Parameters:

img (str or ndarray) – The image to be displayed.
bboxes (ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5).
labels (ndarray) – Labels of bboxes.
class_names (list[str]) – Names of each classes.
score_thr (float) – Minimum score of bboxes to be shown.
bbox_color (str or tuple or Color) – Color of bbox lines.
text_color (str or tuple or Color) – Color of texts.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str or None) – The filename to write the image.

mmcv.visualization.flowshow(flow, win_name='', wait_time=0)[source]¶

Show optical flow.

Parameters:	flow (ndarray or str) – The optical flow to be displayed. win_name (str) – The window name. wait_time (int) – Value of waitKey param.

mmcv.visualization.flow2rgb(flow, color_wheel=None, unknown_thr=1000000.0)[source]¶

Convert flow map to RGB image.

Parameters:	flow (ndarray) – Array of optical flow. color_wheel (ndarray or None) – Color wheel used to map flow field to RGB colorspace. Default color wheel will be used if not specified. unknown_thr (str) – Values above this threshold will be marked as unknown and thus ignored.
Returns:	RGB image that can be visualized.
Return type:	ndarray

mmcv.visualization.make_color_wheel(bins=None)[source]¶

Build a color wheel.

Parameters:	bins (list or tuple, optional) – Specify the number of bins for each color range, corresponding to six ranges: red -> yellow, yellow -> green, green -> cyan, cyan -> blue, blue -> magenta, magenta -> red. [15, 6, 4, 11, 13, 6] is used for default (see Middlebury).
Returns:	Color wheel of shape (total_bins, 3).
Return type:	ndarray

utils¶

class mmcv.utils.Config(cfg_dict=None, cfg_text=None, filename=None)[source]¶

A facility for config and config files.

It supports common file formats as configs: python/json/yaml. The interface is the same as a dict object and also allows access config values as attributes.

Example

>>> cfg = Config(dict(a=1, b=dict(b1=[0, 1])))
>>> cfg.a
1
>>> cfg.b
{'b1': [0, 1]}
>>> cfg.b.b1
[0, 1]
>>> cfg = Config.fromfile('tests/data/config/a.py')
>>> cfg.filename
"/home/kchen/projects/mmcv/tests/data/config/a.py"
>>> cfg.item4
'test'
>>> cfg
"Config [path: /home/kchen/projects/mmcv/tests/data/config/a.py]: "
"{'item1': [1, 2], 'item2': {'a': 0}, 'item3': True, 'item4': 'test'}"

static auto_argparser(description=None)[source]¶: Generate argparser from config file automatically (experimental)

merge_from_dict(options)[source]¶

Merge list into cfg_dict.

Merge the dict parsed by MultipleKVAction into this cfg.

Examples

>>> options = {'model.backbone.depth': 50,
...            'model.backbone.with_cp':True}
>>> cfg = Config(dict(model=dict(backbone=dict(type='ResNet'))))
>>> cfg.merge_from_dict(options)
>>> cfg_dict = super(Config, self).__getattribute__('_cfg_dict')
>>> assert cfg_dict == dict(
...     model=dict(backbone=dict(depth=50, with_cp=True)))

Parameters:	options (dict) – dict of configs to merge from.

class mmcv.utils.ConfigDict(*args, **kwargs)[source]¶

class mmcv.utils.DictAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]¶: argparse action to split an argument into KEY=VALUE form on the first = and append to a dictionary. List options should be passed as comma separated values, i.e KEY=V1,V2,V3

mmcv.utils.get_logger(name, log_file=None, log_level=20)[source]¶

Initialize and get a logger by name.

If the logger has not been initialized, this method will initialize the logger by adding one or two handlers, otherwise the initialized logger will be directly returned. During initialization, a StreamHandler will always be added. If log_file is specified and the process rank is 0, a FileHandler will also be added.

Parameters:	name (str) – Logger name. log_file (str \| None) – The log filename. If specified, a FileHandler will be added to the logger. log_level (int) – The logger level. Note that only the process of rank 0 is affected, and other processes will set the level to “Error” thus be silent most of the time.
Returns:	The expected logger.
Return type:	logging.Logger

mmcv.utils.print_log(msg, logger=None, level=20)[source]¶

Print a log message.

Parameters:

msg (str) – The message to be logged.
logger (logging.Logger | str | None) – The logger to be used. Some special loggers are: - “silent”: no message will be printed. - other str: the logger obtained with get_root_logger(logger). - None: The print() method will be used to print log messages.
level (int) – Logging level. Only available when logger is a Logger object or “root”.

mmcv.utils.is_str(x)[source]¶

Whether the input is an string instance.

Note: This method is deprecated since python 2 is no longer supported.

mmcv.utils.iter_cast(inputs, dst_type, return_type=None)[source]¶

Cast elements of an iterable object into some type.

Parameters:	inputs (Iterable) – The input object. dst_type (type) – Destination type. return_type (type, optional) – If specified, the output object will be converted to this type, otherwise an iterator.
Returns:	The converted object.
Return type:	iterator or specified type

mmcv.utils.list_cast(inputs, dst_type)[source]¶

Cast elements of an iterable object into a list of some type.

A partial method of iter_cast().

mmcv.utils.tuple_cast(inputs, dst_type)[source]¶

Cast elements of an iterable object into a tuple of some type.

A partial method of iter_cast().

mmcv.utils.is_seq_of(seq, expected_type, seq_type=None)[source]¶

Check whether it is a sequence of some type.

Parameters:	seq (Sequence) – The sequence to be checked. expected_type (type) – Expected type of sequence items. seq_type (type, optional) – Expected sequence type.
Returns:	Whether the sequence is valid.
Return type:	bool

mmcv.utils.is_list_of(seq, expected_type)[source]¶

Check whether it is a list of some type.

A partial method of is_seq_of().

mmcv.utils.is_tuple_of(seq, expected_type)[source]¶

Check whether it is a tuple of some type.

A partial method of is_seq_of().

mmcv.utils.slice_list(in_list, lens)[source]¶

Slice a list into several sub lists by a list of given length.

Parameters:	in_list (list) – The list to be sliced. lens (int or list) – The expected length of each out list.
Returns:	A list of sliced list.
Return type:	list

mmcv.utils.concat_list(in_list)[source]¶

Concatenate a list of list into a single list.

Parameters:	in_list (list) – The list of list to be merged.
Returns:	The concatenated flat list.
Return type:	list

mmcv.utils.check_prerequisites(prerequisites, checker, msg_tmpl='Prerequisites "{}" are required in method "{}" but not found, please install them first.')[source]¶

A decorator factory to check if prerequisites are satisfied.

Parameters:	prerequisites (str of list[str]) – Prerequisites to be checked. checker (callable) – The checker method that returns True if a prerequisite is meet, False otherwise. msg_tmpl (str) – The message template with two variables.
Returns:	A specific decorator.
Return type:	decorator

mmcv.utils.requires_package(prerequisites)[source]¶

A decorator to check if some python packages are installed.

Example

>>> @requires_package('numpy')
>>> func(arg1, args):
>>>     return numpy.zeros(1)
array([0.])
>>> @requires_package(['numpy', 'non_package'])
>>> func(arg1, args):
>>>     return numpy.zeros(1)
ImportError

mmcv.utils.requires_executable(prerequisites)[source]¶

A decorator to check if some executable files are installed.

Example

>>> @requires_executable('ffmpeg')
>>> func(arg1, args):
>>>     print(1)
1

mmcv.utils.scandir(dir_path, suffix=None, recursive=False)[source]¶

Scan a directory to find the interested files.

Parameters:	(str \| obj (dir_path) – Path): Path of the directory. suffix (str \| tuple(str), optional) – File suffix that we are interested in. Default: None. recursive (bool, optional) – If set to True, recursively scan the directory. Default: False.
Returns:	A generator for all the interested files with relative pathes.

class mmcv.utils.ProgressBar(task_num=0, bar_width=50, start=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶: A progress bar which can print the progress.

mmcv.utils.track_progress(func, tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, **kwargs)[source]¶

Track the progress of tasks execution with a progress bar.

Tasks are done with a simple for-loop.

Parameters:	func (callable) – The function to be applied to each task. tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num). bar_width (int) – Width of progress bar.
Returns:	The task results.
Return type:	list

mmcv.utils.track_iter_progress(tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Track the progress of tasks iteration or enumeration with a progress bar.

Tasks are yielded with a simple for-loop.

Parameters:	tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num). bar_width (int) – Width of progress bar.
Yields:	list – The task results.

mmcv.utils.track_parallel_progress(func, tasks, nproc, initializer=None, initargs=None, bar_width=50, chunksize=1, skip_first=False, keep_order=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Track the progress of parallel task execution with a progress bar.

The built-in multiprocessing module is used for process pools and tasks are done with Pool.map() or Pool.imap_unordered().

Parameters:	func (callable) – The function to be applied to each task. tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num). nproc (int) – Process (worker) number. initializer (None or callable) – Refer to `multiprocessing.Pool` for details. initargs (None or tuple) – Refer to `multiprocessing.Pool` for details. chunksize (int) – Refer to `multiprocessing.Pool` for details. bar_width (int) – Width of progress bar. skip_first (bool) – Whether to skip the first sample for each worker when estimating fps, since the initialization step may takes longer. keep_order (bool) – If True, `Pool.imap()` is used, otherwise `Pool.imap_unordered()` is used.
Returns:	The task results.
Return type:	list

class mmcv.utils.Registry(name)[source]¶

A registry to map strings to classes.

Parameters:	name (str) – Registry name.

get(key)[source]¶

Get the registry record.

Parameters:	key (str) – The class name in string format.
Returns:	The corresponding class.
Return type:	class

register_module(name=None, force=False, module=None)[source]¶

Register a module.

A record will be added to self._module_dict, whose key is the class name or the specified name, and value is the class itself. It can be used as a decorator or a normal function.

Example

>>> backbones = Registry('backbone')
>>> @backbones.register_module()
>>> class ResNet:
>>>     pass

>>> backbones = Registry('backbone')
>>> @backbones.register_module(name='mnet')
>>> class MobileNet:
>>>     pass

>>> backbones = Registry('backbone')
>>> class ResNet:
>>>     pass
>>> backbones.register_module(ResNet)

Parameters:	name (str \| None) – The module name to be registered. If not specified, the class name will be used. force (bool, optional) – Whether to override an existing class with the same name. Default: False. module (type) – Module class to be registered.

mmcv.utils.build_from_cfg(cfg, registry, default_args=None)[source]¶

Build a module from config dict.

Parameters:	cfg (dict) – Config dict. It should at least contain the key “type”. registry (`Registry`) – The registry to search the type from. default_args (dict, optional) – Default initialization arguments.
Returns:	The constructed object.
Return type:	object

class mmcv.utils.Timer(start=True, print_tmpl=None)[source]¶

A flexible Timer class.

Example:

>>> import time
>>> import mmcv
>>> with mmcv.Timer():
>>>     # simulate a code block that will run for 1s
>>>     time.sleep(1)
1.000
>>> with mmcv.Timer(print_tmpl='it takes {:.1f} seconds'):
>>>     # simulate a code block that will run for 1s
>>>     time.sleep(1)
it takes 1.0 seconds
>>> timer = mmcv.Timer()
>>> time.sleep(0.5)
>>> print(timer.since_start())
0.500
>>> time.sleep(0.5)
>>> print(timer.since_last_check())
0.500
>>> print(timer.since_start())
1.000

is_running¶

indicate whether the timer is running

Type:	bool

since_last_check()[source]¶

Time since the last checking.

Either since_start() or since_last_check() is a checking operation.

Returns (float): Time in seconds.

since_start()[source]¶

Total time since the timer is started.

Returns (float): Time in seconds.

start()[source]¶: Start the timer.

exception mmcv.utils.TimerError(message)[source]¶

mmcv.utils.check_time(timer_id)[source]¶

Add check points in a single line.

This method is suitable for running a task on a list of items. A timer will be registered when the method is called for the first time.

Example:

>>> import time
>>> import mmcv
>>> for i in range(1, 6):
>>>     # simulate a code block
>>>     time.sleep(i)
>>>     mmcv.check_time('task1')
2.000
3.000
4.000
5.000

Parameters:	timer_id (str) – Timer identifier.

class mmcv.utils.SyncBatchNorm(num_features: int, eps: float = 1e-05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True, process_group: Optional[Any] = None)[source]¶

class mmcv.utils.BuildExtension(*args, **kwargs)[source]¶

A custom setuptools build extension .

This setuptools.build_ext subclass takes care of passing the minimum required compiler flags (e.g. -std=c++14) as well as mixed C++/CUDA compilation (and support for CUDA files in general).

When using BuildExtension, it is allowed to supply a dictionary for extra_compile_args (rather than the usual list) that maps from languages (cxx or nvcc) to a list of additional compiler flags to supply to the compiler. This makes it possible to supply different flags to the C++ and CUDA compiler during mixed compilation.

use_ninja (bool): If use_ninja is True (default), then we attempt to build using the Ninja backend. Ninja greatly speeds up compilation compared to the standard setuptools.build_ext. Fallbacks to the standard distutils backend if Ninja is not available.

Note

By default, the Ninja backend uses #CPUS + 2 workers to build the extension. This may use up too many resources on some systems. One can control the number of workers by setting the MAX_JOBS environment variable to a non-negative number.

finalize_options()[source]¶

Set final values for all the options that this command supports. This is always called as late as possible, ie. after any option assignments from the command-line or from other commands have been done. Thus, this is the place to code option dependencies: if ‘foo’ depends on ‘bar’, then it is safe to set ‘foo’ from ‘bar’ as long as ‘foo’ still has the same value it was assigned in ‘initialize_options()’.

This method must be implemented by all command classes.

get_ext_filename(ext_name)[source]¶: Convert the name of an extension (eg. “foo.bar”) into the name of the file from which it will be loaded (eg. “foo/bar.so”, or “foobar.pyd”).

classmethod with_options(**options)[source]¶: Returns a subclass with alternative constructor that extends any original keyword arguments to the original constructor with the given options.

mmcv.utils.CppExtension(name, sources, *args, **kwargs)[source]¶

Creates a setuptools.Extension for C++.

Convenience method that creates a setuptools.Extension with the bare minimum (but often sufficient) arguments to build a C++ extension.

All arguments are forwarded to the setuptools.Extension constructor.

Example

>>> from setuptools import setup
>>> from torch.utils.cpp_extension import BuildExtension, CppExtension
>>> setup(
        name='extension',
        ext_modules=[
            CppExtension(
                name='extension',
                sources=['extension.cpp'],
                extra_compile_args=['-g']),
        ],
        cmdclass={
            'build_ext': BuildExtension
        })

mmcv.utils.CUDAExtension(name, sources, *args, **kwargs)[source]¶

Creates a setuptools.Extension for CUDA/C++.

Convenience method that creates a setuptools.Extension with the bare minimum (but often sufficient) arguments to build a CUDA/C++ extension. This includes the CUDA include path, library path and runtime library.

All arguments are forwarded to the setuptools.Extension constructor.

Example

>>> from setuptools import setup
>>> from torch.utils.cpp_extension import BuildExtension, CUDAExtension
>>> setup(
        name='cuda_extension',
        ext_modules=[
            CUDAExtension(
                    name='cuda_extension',
                    sources=['extension.cpp', 'extension_kernel.cu'],
                    extra_compile_args={'cxx': ['-g'],
                                        'nvcc': ['-O2']})
        ],
        cmdclass={
            'build_ext': BuildExtension
        })

class mmcv.utils.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, multiprocessing_context=None, generator=None)[source]¶

Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset.

The DataLoader supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning.

See torch.utils.data documentation page for more details.

Parameters:

dataset (Dataset) – dataset from which to load the data.
batch_size (int, optional) – how many samples per batch to load (default: 1).
shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).
sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.
batch_sampler (Sampler or Iterable, optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.
num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
pin_memory (bool, optional) – If True, the data loader will copy Tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type, see the example below.
drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)
timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: 0)
worker_init_fn (callable, optional) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)

Warning

If the spawn start method is used, worker_init_fn cannot be an unpicklable object, e.g., a lambda function. See multiprocessing-best-practices on more details related to multiprocessing in PyTorch.

Warning

len(dataloader) heuristic is based on the length of the sampler used. When dataset is an IterableDataset, it instead returns an estimate based on len(dataset) / batch_size, with proper rounding depending on drop_last, regardless of multi-process loading configurations. This represents the best guess PyTorch can make because PyTorch trusts user dataset code in correctly handling multi-process loading to avoid duplicate data.

However, if sharding results in multiple workers having incomplete last batches, this estimate can still be inaccurate, because (1) an otherwise complete batch can be broken into multiple ones and (2) more than one batch worth of samples can be dropped when drop_last is set. Unfortunately, PyTorch can not detect such cases in general.

See `Dataset Types`_ for more details on these two types of datasets and how IterableDataset interacts with `Multi-process data loading`_.

mmcv.utils.PoolDataLoader¶: alias of torch.utils.data.dataloader.DataLoader

mmcv.utils.deprecated_api_warning(name_dict, cls_name=None)[source]¶

A decorator to check if some argments are deprecate and try to replace deprecate src_arg_name to dst_arg_name.

Parameters:	name_dict (dict) – key (str): Deprecate argument names. val (str): Expected argument names.
Returns:	New function.
Return type:	func

cnn¶

class mmcv.cnn.AlexNet(num_classes=-1)[source]¶

AlexNet backbone.

Parameters:	num_classes (int) – number of classes for classification.

class mmcv.cnn.VGG(depth, with_bn=False, num_classes=-1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=(0, 1, 2, 3, 4), frozen_stages=-1, bn_eval=True, bn_frozen=False, ceil_mode=False, with_last_pool=True)[source]¶

VGG backbone.

Parameters:

depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_bn (bool) – Use BatchNorm or not.
num_classes (int) – number of classes for classification.
num_stages (int) – VGG stages, normally 5.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).
bn_frozen (bool) – Whether to freeze weight and bias of BN layers.

train(mode=True)[source]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:	mode (bool) – whether to set training mode (`True`) or evaluation mode (`False`). Default: `True`.
Returns:	self
Return type:	Module

class mmcv.cnn.ResNet(depth, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', frozen_stages=-1, bn_eval=True, bn_frozen=False, with_cp=False)[source]¶

ResNet backbone.

Parameters:

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
num_stages (int) – Resnet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).
bn_frozen (bool) – Whether to freeze weight and bias of BN layers.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

train(mode=True)[source]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:	mode (bool) – whether to set training mode (`True`) or evaluation mode (`False`). Default: `True`.
Returns:	self
Return type:	Module

mmcv.cnn.bias_init_with_prob(prior_prob)[source]¶: initialize conv/fc bias value according to giving probablity.

class mmcv.cnn.ConvModule(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias='auto', conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, inplace=True, with_spectral_norm=False, padding_mode='zeros', order=('conv', 'norm', 'act'))[source]¶

A conv block that bundles conv/norm/activation layers.

This block simplifies the usage of convolution layers, which are commonly used with a norm layer (e.g., BatchNorm) and activation layer (e.g., ReLU). It is based upon three build methods: build_conv_layer(), build_norm_layer() and build_activation_layer().

Besides, we add some additional features in this module. 1. Automatically set bias of the conv layer. 2. Spectral norm is supported. 3. More padding modes are supported. Before PyTorch 1.5, nn.Conv2d only supports zero and circular padding, and we add “reflect” padding mode.

Parameters:

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int | tuple[int]) – Same as nn.Conv2d.
stride (int | tuple[int]) – Same as nn.Conv2d.
padding (int | tuple[int]) – Same as nn.Conv2d.
dilation (int | tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False. Default: “auto”.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
inplace (bool) – Whether to use inplace mode for activation. Default: True.
with_spectral_norm (bool) – Whether use spectral norm in conv module. Default: False.
padding_mode (str) – If the padding_mode has not been supported by current Conv2d in PyTorch, we will use our own padding layer instead. Currently, we support [‘zeros’, ‘circular’] with official implementation and [‘reflect’] with our own implementation. Default: ‘zeros’.
order (tuple[str]) – The order of conv/norm/activation layers. It is a sequence of “conv”, “norm” and “act”. Common examples are (“conv”, “norm”, “act”) and (“act”, “conv”, “norm”). Default: (‘conv’, ‘norm’, ‘act’).

mmcv.cnn.build_activation_layer(cfg)[source]¶

Build activation layer.

Parameters:	cfg (dict) – The activation layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate an activation layer.
Returns:	Created activation layer.
Return type:	nn.Module

mmcv.cnn.build_conv_layer(cfg, *args, **kwargs)[source]¶

Build convolution layer.

Parameters:	cfg (None or dict) – The conv layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate an activation layer. args (argument list) – Arguments passed to the __init__ method of the corresponding conv layer. kwargs (keyword arguments) – Keyword arguments passed to the __init__ method of the corresponding conv layer.
Returns:	Created conv layer.
Return type:	nn.Module

mmcv.cnn.build_norm_layer(cfg, num_features, postfix='')[source]¶

Build normalization layer.

Parameters:

cfg (dict) –
The norm layer config, which should contain:
- type (str): Layer type.
- layer args: Args needed to instantiate a norm layer.
- requires_grad (bool, optional): Whether stop gradient updates.
num_features (int) – Number of input channels.
postfix (int | str) – The postfix to be appended into norm abbreviation to create named layer.

Returns:

The first element is the layer name consisting of: abbreviation and postfix, e.g., bn1, gn. The second element is the created norm layer.

Return type:

(str, nn.Module)

mmcv.cnn.build_padding_layer(cfg, *args, **kwargs)[source]¶

Build padding layer.

Parameters:	cfg (None or dict) – The padding layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate a padding layer.
Returns:	Created padding layer.
Return type:	nn.Module

mmcv.cnn.build_upsample_layer(cfg, *args, **kwargs)[source]¶

Build upsample layer.

Parameters:	cfg (dict) – The upsample layer config, which should contain: type (str): Layer type. scale_factor (int): Upsample ratio, which is not applicable to deconv. layer args: Args needed to instantiate a upsample layer. args (argument list) – Arguments passed to the `__init__` method of the corresponding conv layer. kwargs (keyword arguments) – Keyword arguments passed to the `__init__` method of the corresponding conv layer.
Returns:	Created upsample layer.
Return type:	nn.Module

mmcv.cnn.build_plugin_layer(cfg, postfix='', **kwargs)[source]¶

Build plugin layer.

Parameters:	cfg (None or dict) – cfg should contain: type (str): identify plugin layer type. layer args: args needed to instantiate a plugin layer. postfix (int, str) – appended into norm abbreviation to create named layer. Default: ‘’.
Returns:	name (str): abbreviation + postfix layer (nn.Module): created plugin layer
Return type:	tuple[str, nn.Module]

mmcv.cnn.is_norm(layer, exclude=None)[source]¶

Check if a layer is a normalization layer.

Parameters:	layer (nn.Module) – The layer to be checked. exclude (type \| tuple[type]) – Types to be excluded.
Returns:	Whether the layer is a norm layer.
Return type:	bool

class mmcv.cnn.NonLocal1d(in_channels, sub_sample=False, conv_cfg={'type': 'Conv1d'}, **kwargs)[source]¶

1D Non-local module.

Parameters:	in_channels (int) – Same as NonLocalND. sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False. conv_cfg (None \| dict) – Same as NonLocalND. Default: dict(type=’Conv1d’).

class mmcv.cnn.NonLocal2d(in_channels, sub_sample=False, conv_cfg={'type': 'Conv2d'}, **kwargs)[source]¶

2D Non-local module.

Parameters:	in_channels (int) – Same as NonLocalND. sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False. conv_cfg (None \| dict) – Same as NonLocalND. Default: dict(type=’Conv2d’).

class mmcv.cnn.NonLocal3d(in_channels, sub_sample=False, conv_cfg={'type': 'Conv3d'}, **kwargs)[source]¶

3D Non-local module.

Parameters:	in_channels (int) – Same as NonLocalND. sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False. conv_cfg (None \| dict) – Same as NonLocalND. Default: dict(type=’Conv3d’).

class mmcv.cnn.ContextBlock(in_channels, ratio, pooling_type='att', fusion_types=('channel_add', ))[source]¶

ContextBlock module in GCNet.

See ‘GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond’ (https://arxiv.org/abs/1904.11492) for details.

Parameters:

in_channels (int) – Channels of the input feature map.
ratio (float) – Ratio of channels of transform bottleneck
pooling_type (str) – Pooling method for context modeling. Options are ‘att’ and ‘avg’, stand for attention pooling and average pooling respectively. Default: ‘att’.
fusion_types (Sequence[str]) – Fusion method for feature fusion, Options are ‘channels_add’, ‘channel_mul’, stand for channelwise addition and multiplication respectively. Default: (‘channel_add’,)

class mmcv.cnn.HSigmoid[source]¶

Hard Sigmoid Module. Apply the hard sigmoid function: Hsigmoid(x) = min(max((x + 1) / 2, 0), 1)

Returns:	The output tensor.
Return type:	Tensor

class mmcv.cnn.HSwish(inplace=False)[source]¶

Hard Swish Module.

This module applies the hard swish function:

\[Hswish(x) = x * ReLU6(x + 3) / 6\]

Parameters:	inplace (bool) – can optionally do the operation in-place. Default: False.
Returns:	The output tensor.
Return type:	Tensor

class mmcv.cnn.GeneralizedAttention(in_channels, spatial_range=-1, num_heads=9, position_embedding_dim=-1, position_magnitude=1, kv_stride=2, q_stride=1, attention_type='1111')[source]¶

GeneralizedAttention module.

See ‘An Empirical Study of Spatial Attention Mechanisms in Deep Networks’ (https://arxiv.org/abs/1711.07971) for details.

Parameters:

in_channels (int) – Channels of the input feature map.
spatial_range (int) – The spatial range. -1 indicates no spatial range constraint. Default: -1.
num_heads (int) – The head number of empirical_attention module. Default: 9.
position_embedding_dim (int) – The position embedding dimension. Default: -1.
position_magnitude (int) – A multiplier acting on coord difference. Default: 1.
kv_stride (int) – The feature stride acting on key/value feature map. Default: 2.
q_stride (int) – The feature stride acting on query feature map. Default: 1.
attention_type (str) –
A binary indicator string for indicating which items in generalized empirical_attention module are used. Default: ‘1111’.
- ‘1000’ indicates ‘query and key content’ (appr - appr) item,
- ‘0100’ indicates ‘query content and relative position’ (appr - position) item,
- ‘0010’ indicates ‘key content only’ (bias - appr) item,
- ‘0001’ indicates ‘relative position only’ (bias - position) item.

class mmcv.cnn.Scale(scale=1.0)[source]¶

A learnable scale parameter.

This layer scales the input by a learnable factor. It multiplies a learnable scale parameter of shape (1,) with input of any shape.

Parameters:	scale (float) – Initial value of scale factor. Default: 1.0

mmcv.cnn.get_model_complexity_info(model, input_shape, print_per_layer_stat=True, as_strings=True, input_constructor=None, flush=False, ost=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Get complexity information of a model.

This method can calculate FLOPs and parameter counts of a model with corresponding input shape. It can also print complexity information for each layer in a model.

Supported layers are listed as below:

Convolutions: nn.Conv1d, nn.Conv2d, nn.Conv3d.
Activations: nn.ReLU, nn.PReLU, nn.ELU, nn.LeakyReLU,

nn.ReLU6.
Poolings: nn.MaxPool1d, nn.MaxPool2d, nn.MaxPool3d,

nn.AvgPool1d, nn.AvgPool2d, nn.AvgPool3d, nn.AdaptiveMaxPool1d, nn.AdaptiveMaxPool2d, nn.AdaptiveMaxPool3d, nn.AdaptiveAvgPool1d, nn.AdaptiveAvgPool2d, nn.AdaptiveAvgPool3d.
BatchNorms: nn.BatchNorm1d, nn.BatchNorm2d,

nn.BatchNorm3d.
Linear: nn.Linear.
Deconvolution: nn.ConvTranspose2d.
Upsample: nn.Upsample.

Parameters:

model (nn.Module) – The model for complexity calculation.
input_shape (tuple) – Input shape used for calculation.
print_per_layer_stat (bool) – Whether to print complexity information for each layer in a model. Default: True.
as_strings (bool) – Output FLOPs and params counts in a string form. Default: True.
input_constructor (None | callable) – If specified, it takes a callable method that generates input. otherwise, it will generate a random tensor with input shape to calculate FLOPs. Default: None.
flush (bool) – same as that in print(). Default: False.
ost (stream) – same as file param in print(). Default: sys.stdout.

Returns:

If as_strings is set to True, it will return: FLOPs and parameter counts in a string format. otherwise, it will return those in a float number format.

Return type:

tuple[float | str]

class mmcv.cnn.ConvAWS2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)[source]¶

AWS (Adaptive Weight Standardization)

This is a variant of Weight Standardization (https://arxiv.org/pdf/1903.10520.pdf) It is used in DetectoRS to avoid NaN (https://arxiv.org/pdf/2006.02334.pdf)

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the conv kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If set True, adds a learnable bias to the output. Default: True

class mmcv.cnn.ConvWS2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, eps=1e-05)[source]¶

mmcv.cnn.fuse_conv_bn(module)[source]¶

Recursively fuse conv and bn in a module.

During inference, the functionary of batch norm layers is turned off but only the mean and var alone channels are used, which exposes the chance to fuse it with the preceding conv layers to save computations and simplify network structures.

Parameters:	module (nn.Module) – Module to be fused.
Returns:	Fused module.
Return type:	nn.Module

runner¶

class mmcv.runner.BaseRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None)[source]¶

The base class of Runner, a training helper for PyTorch.

All subclasses should implement the following APIs:

run()
train()
val()
save_checkpoint()

Parameters:

model (torch.nn.Module) – The model to be run.
batch_processor (callable) – A callable method that process a data batch. The interface of this method should be batch_processor(model, data, train_mode) -> dict
optimizer (dict or torch.optim.Optimizer) – It can be either an optimizer (in most cases) or a dict of optimizers (in models that requires more than one optimizer, e.g., GAN).
work_dir (str, optional) – The working directory to save checkpoints and logs. Defaults to None.
logger (logging.Logger) – Logger used during training. Defaults to None. (The default value is just for backward compatibility)
meta (dict | None) – A dict records some import information such as environment info and seed, which will be logged in logger hook. Defaults to None.

call_hook(fn_name)[source]¶

Call all hooks.

Parameters:	fn_name (str) – The function name in each hook to be called, such as “before_train_epoch”.

current_lr()[source]¶

Get current learning rates.

Returns:	Current learning rates of all param groups. If the runner has a dict of optimizers, this method will return a dict.
Return type:	list[float] \| dict[str, list[float]]

current_momentum()[source]¶

Get current momentums.

Returns:	Current momentums of all param groups. If the runner has a dict of optimizers, this method will return a dict.
Return type:	list[float] \| dict[str, list[float]]

epoch¶

Current epoch.

Type:	int

hooks¶

A list of registered hooks.

Type:	list[`Hook`]

inner_iter¶

Iteration in an epoch.

Type:	int

iter¶

Current iteration.

Type:	int

max_epochs¶

Maximum training epochs.

Type:	int

max_iters¶

Maximum training iterations.

Type:	int

model_name¶

Name of the model, usually the module class name.

Type:	str

rank¶

Rank of current process. (distributed training)

Type:	int

register_hook(hook, priority='NORMAL')[source]¶

Register a hook into the hook list.

The hook will be inserted into a priority queue, with the specified priority (See Priority for details of priorities). For hooks with the same priority, they will be triggered in the same order as they are registered.

Parameters:	hook (`Hook`) – The hook to be registered. priority (int or str or `Priority`) – Hook priority. Lower value means higher priority.

register_training_hooks(lr_config, optimizer_config=None, checkpoint_config=None, log_config=None, momentum_config=None)[source]¶

Register default hooks for training.

Default hooks include:

LrUpdaterHook
MomentumUpdaterHook
OptimizerStepperHook
CheckpointSaverHook
IterTimerHook
LoggerHook(s)

world_size¶

Number of processes participating in the job. (distributed training)

Type:	int

class mmcv.runner.Runner(*args, **kwargs)[source]¶: Deprecated name of EpochBasedRunner.

class mmcv.runner.EpochBasedRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None)[source]¶

Epoch-based Runner.

This runner train models epoch by epoch.

run(data_loaders, workflow, max_epochs, **kwargs)[source]¶

Start running.

Parameters:	data_loaders (list[`DataLoader`]) – Dataloaders for training and validation. workflow (list[tuple]) – A list of (phase, epochs) to specify the running order and epochs. E.g, [(‘train’, 2), (‘val’, 1)] means running 2 epochs for training and 1 epoch for validation, iteratively. max_epochs (int) – Total training epochs.

save_checkpoint(out_dir, filename_tmpl='epoch_{}.pth', save_optimizer=True, meta=None, create_symlink=True)[source]¶

Save the checkpoint.

Parameters:

out_dir (str) – The directory that checkpoints are saved.
filename_tmpl (str, optional) – The checkpoint filename template, which contains a placeholder for the epoch number. Defaults to ‘epoch_{}.pth’.
save_optimizer (bool, optional) – Whether to save the optimizer to the checkpoint. Defaults to True.
meta (dict, optional) – The meta information to be saved in the checkpoint. Defaults to None.
create_symlink (bool, optional) – Whether to create a symlink “latest.pth” to point to the latest checkpoint. Defaults to True.

class mmcv.runner.IterBasedRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None)[source]¶

Iteration-based Runner.

This runner train models iteration by iteration.

register_training_hooks(lr_config, optimizer_config=None, checkpoint_config=None, log_config=None, momentum_config=None)[source]¶

Register default hooks for iter-based training.

Default hooks include:

LrUpdaterHook
MomentumUpdaterHook
OptimizerStepperHook
CheckpointSaverHook
IterTimerHook
LoggerHook(s)

resume(checkpoint, resume_optimizer=True, map_location='default')[source]¶

Resume model from checkpoint.

Parameters:	checkpoint (str) – Checkpoint to resume from. resume_optimizer (bool, optional) – Whether resume the optimizer(s) if the checkpoint file includes optimizer(s). Default to True. map_location (str, optional) – Same as `torch.load()`. Default to ‘default’.

run(data_loaders, workflow, max_iters, **kwargs)[source]¶

Start running.

Parameters:	data_loaders (list[`DataLoader`]) – Dataloaders for training and validation. workflow (list[tuple]) – A list of (phase, iters) to specify the running order and iterations. E.g, [(‘train’, 10000), (‘val’, 1000)] means running 10000 iterations for training and 1000 iterations for validation, iteratively. max_iters (int) – Total training iterations.

save_checkpoint(out_dir, filename_tmpl='iter_{}.pth', meta=None, save_optimizer=True, create_symlink=True)[source]¶

Save checkpoint to file.

Parameters:

out_dir (str) – Directory to save checkpoint files.
filename_tmpl (str, optional) – Checkpoint file template. Defaults to ‘iter_{}.pth’.
meta (dict, optional) – Metadata to be saved in checkpoint. Defaults to None.
save_optimizer (bool, optional) – Whether save optimizer. Defaults to True.
create_symlink (bool, optional) – Whether create symlink to the latest checkpoint file. Defaults to True.

class mmcv.runner.CheckpointHook(interval=-1, by_epoch=True, save_optimizer=True, out_dir=None, max_keep_ckpts=-1, **kwargs)[source]¶

Save checkpoints periodically.

Parameters:

interval (int) – The saving period. If by_epoch=True, interval indicates epochs, otherwise it indicates iterations. Default: -1, which means “never”.
by_epoch (bool) – Saving checkpoints by epoch or by iteration. Default: True.
save_optimizer (bool) – Whether to save optimizer state_dict in the checkpoint. It is usually used for resuming experiments. Default: True.
out_dir (str, optional) – The directory to save checkpoints. If not specified, runner.work_dir will be used by default.
max_keep_ckpts (int, optional) – The maximum checkpoints to keep. In some cases we want only the latest few checkpoints and would like to delete old ones to save the disk space. Default: -1, which means unlimited.

class mmcv.runner.LrUpdaterHook(by_epoch=True, warmup=None, warmup_iters=0, warmup_ratio=0.1, warmup_by_epoch=False)[source]¶

LR Scheduler in MMCV.

Parameters:

by_epoch (bool) – LR changes epoch by epoch
warmup (string) – Type of warmup used. It can be None(use no warmup), ‘constant’, ‘linear’ or ‘exp’
warmup_iters (int) – The number of iterations or epochs that warmup lasts
warmup_ratio (float) – LR used at the beginning of warmup equals to warmup_ratio * initial_lr
warmup_by_epoch (bool) – When warmup_by_epoch == True, warmup_iters means the number of epochs that warmup lasts, otherwise means the number of iteration that warmup lasts

class mmcv.runner.LoggerHook(interval=10, ignore_last=True, reset_flag=False, by_epoch=True)[source]¶

Base class for logger hooks.

Parameters:	interval (int) – Logging interval (every k iterations). ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. reset_flag (bool) – Whether to clear the output buffer after logging. by_epoch (bool) – Whether EpochBasedRunner is used.

class mmcv.runner.PaviLoggerHook(init_kwargs=None, add_graph=False, add_last_ckpt=False, interval=10, ignore_last=True, reset_flag=True, by_epoch=True)[source]¶

class mmcv.runner.TextLoggerHook(by_epoch=True, interval=10, ignore_last=True, reset_flag=False, interval_exp_name=1000)[source]¶

Logger hook in text.

In this logger hook, the information will be printed on terminal and saved in json file.

Parameters:

by_epoch (bool) – Whether EpochBasedRunner is used.
interval (int) – Logging interval (every k iterations).
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval.
reset_flag (bool) – Whether to clear the output buffer after logging.
interval_exp_name (int) – Logging interval for experiment name. This feature is to help users conveniently get the experiment information from screen or log file. Default: 1000.

class mmcv.runner.TensorboardLoggerHook(log_dir=None, interval=10, ignore_last=True, reset_flag=True, by_epoch=True)[source]¶

class mmcv.runner.WandbLoggerHook(init_kwargs=None, interval=10, ignore_last=True, reset_flag=True)[source]¶

class mmcv.runner.MlflowLoggerHook(exp_name=None, tags=None, log_model=True, interval=10, ignore_last=True, reset_flag=True)[source]¶

mmcv.runner.load_state_dict(module, state_dict, strict=False, logger=None)[source]¶

Load state_dict to a module.

This method is modified from torch.nn.Module.load_state_dict(). Default value for strict is set to False and the message for param mismatch will be shown even if strict is False.

Parameters:

module (Module) – Module that receives the state_dict.
state_dict (OrderedDict) – Weights.
strict (bool) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: False.
logger (logging.Logger, optional) – Logger to log the error message. If not specified, print function will be used.

mmcv.runner.load_checkpoint(model, filename, map_location=None, strict=False, logger=None)[source]¶

Load checkpoint from a file or URI.

Parameters:	model (Module) – Module to load checkpoint. filename (str) – Accept local filepath, URL, `torchvision://xxx`, `open-mmlab://xxx`. Please refer to `docs/model_zoo.md` for details. map_location (str) – Same as `torch.load()`. strict (bool) – Whether to allow different params for the model and checkpoint. logger (`logging.Logger` or None) – The logger for error message.
Returns:	The loaded checkpoint.
Return type:	dict or OrderedDict

mmcv.runner.weights_to_cpu(state_dict)[source]¶

Copy a model state_dict to cpu.

Parameters:	state_dict (OrderedDict) – Model weights on GPU.
Returns:	Model weights on GPU.
Return type:	OrderedDict

mmcv.runner.save_checkpoint(model, filename, optimizer=None, meta=None)[source]¶

Save checkpoint to file.

The checkpoint will have 3 fields: meta, state_dict and optimizer. By default meta will contain version and time info.

Parameters:	model (Module) – Module whose params are to be saved. filename (str) – Checkpoint filename. optimizer (`Optimizer`, optional) – Optimizer to be saved. meta (dict, optional) – Metadata to be saved in checkpoint.

class mmcv.runner.Priority[source]¶

Hook priority levels.

Level	Value
HIGHEST	0
VERY_HIGH	10
HIGH	30
NORMAL	50
LOW	70
VERY_LOW	90
LOWEST	100

mmcv.runner.get_priority(priority)[source]¶

Get priority value.

Parameters:	priority (int or str or `Priority`) – Priority.
Returns:	The priority value.
Return type:	int

mmcv.runner.obj_from_dict(info, parent=None, default_args=None)[source]¶

Initialize an object from dict.

The dict must contain the key “type”, which indicates the object type, it can be either a string or type, such as “list” or list. Remaining fields are treated as the arguments for constructing the object.

Parameters:	info (dict) – Object types and arguments. parent (`module`) – Module which may containing expected object classes. default_args (dict, optional) – Default arguments for initializing the object.
Returns:	Object built from the dict.
Return type:	any type

class mmcv.runner.DefaultOptimizerConstructor(optimizer_cfg, paramwise_cfg=None)[source]¶

Default constructor for optimizers.

By default each parameter share the same optimizer settings, and we provide an argument paramwise_cfg to specify parameter-wise settings. It is a dict and may contain the following fields:

custom_keys (dict): Specified parameters-wise settings by keys. If one of the keys in custom_keys is a substring of the name of one parameter, then the setting of the parameter will be specified by custom_keys[key] and other setting like bias_lr_mult etc. will be ignored. It should be noted that the aforementioned key is the longest key that is a substring of the name of the parameter. If there are multiple matched keys with the same length, then the key with lower alphabet order will be chosen. custom_keys[key] should be a dict and may contain fields lr_mult and decay_mult. See Example 2 below.
bias_lr_mult (float): It will be multiplied to the learning rate for all bias parameters (except for those in normalization layers).
bias_decay_mult (float): It will be multiplied to the weight decay for all bias parameters (except for those in normalization layers and depthwise conv layers).
norm_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of normalization layers.
dwconv_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of depthwise conv layers.
bypass_duplicate (bool): If true, the duplicate parameters would not be added into optimizer. Default: False.

Parameters:	model (`nn.Module`) – The model with parameters to be optimized. optimizer_cfg (dict) – The config dict of the optimizer. Positional fields are type: class name of the optimizer. Optional fields are any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, etc. paramwise_cfg (dict, optional) – Parameter-wise options.

Example 1:

>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001)
>>> paramwise_cfg = dict(norm_decay_mult=0.)
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)

Example 2:

>>> # assume model have attribute model.backbone and model.cls_head
>>> optimizer_cfg = dict(type='SGD', lr=0.01, weight_decay=0.95)
>>> paramwise_cfg = dict(custom_keys={
        '.backbone': dict(lr_mult=0.1, decay_mult=0.9)})
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)
>>> # Then the `lr` and `weight_decay` for model.backbone is
>>> # (0.01 * 0.1, 0.95 * 0.9). `lr` and `weight_decay` for
>>> # model.cls_head is (0.01, 0.95).

add_params(params, module, prefix='')[source]¶

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

Parameters:	params (list[dict]) – A list of param groups, it will be modified in place. module (nn.Module) – The module to be added. prefix (str) – The prefix of the module

mmcv.runner.set_random_seed(seed, deterministic=False, use_rank_shift=False)[source]¶

Set random seed.

Parameters:	seed (int) – Seed to be used. deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False. rank_shift (bool) – Whether to add rank number to the random seed to have different random seed in different threads. Default: False.

ops¶

mmcv.ops.bbox_overlaps(bboxes1, bboxes2, mode='iou', aligned=False, offset=0)[source]¶

Calculate overlap between two set of bboxes.

If aligned is False, then calculate the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Parameters:	bboxes1 (Tensor) – shape (m, 4) in <x1, y1, x2, y2> format or empty. bboxes2 (Tensor) – shape (n, 4) in <x1, y1, x2, y2> format or empty. If aligned is `True`, then m and n must be equal. mode (str) – “iou” (intersection over union) or iof (intersection over foreground).
Returns:	shape (m, n) if aligned == False else shape (m, 1)
Return type:	ious(Tensor)

Example

>>> bboxes1 = torch.FloatTensor([
>>>     [0, 0, 10, 10],
>>>     [10, 10, 20, 20],
>>>     [32, 32, 38, 42],
>>> ])
>>> bboxes2 = torch.FloatTensor([
>>>     [0, 0, 10, 20],
>>>     [0, 10, 10, 19],
>>>     [10, 10, 20, 20],
>>> ])
>>> bbox_overlaps(bboxes1, bboxes2)
tensor([[0.5000, 0.0000, 0.0000],
        [0.0000, 0.0000, 1.0000],
        [0.0000, 0.0000, 0.0000]])

Example

>>> empty = torch.FloatTensor([])
>>> nonempty = torch.FloatTensor([
>>>     [0, 0, 10, 9],
>>> ])
>>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1)
>>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0)
>>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0)

class mmcv.ops.CARAFE(kernel_size, group_size, scale_factor)[source]¶

CARAFE: Content-Aware ReAssembly of FEatures

Please refer to https://arxiv.org/abs/1905.02188 for more details.

Parameters:	kernel_size (int) – reassemble kernel size group_size (int) – reassemble group size scale_factor (int) – upsample ratio
Returns:	upsampled feature map

class mmcv.ops.CARAFENaive(kernel_size, group_size, scale_factor)[source]¶

class mmcv.ops.CARAFEPack(channels, scale_factor, up_kernel=5, up_group=1, encoder_kernel=3, encoder_dilation=1, compressed_channels=64)[source]¶

A unified package of CARAFE upsampler that contains: 1) channel compressor 2) content encoder 3) CARAFE op.

Official implementation of ICCV 2019 paper CARAFE: Content-Aware ReAssembly of FEatures Please refer to https://arxiv.org/abs/1905.02188 for more details.

Parameters:	channels (int) – input feature channels scale_factor (int) – upsample ratio up_kernel (int) – kernel size of CARAFE op up_group (int) – group size of CARAFE op encoder_kernel (int) – kernel size of content encoder encoder_dilation (int) – dilation of content encoder compressed_channels (int) – output channels of channels compressor
Returns:	upsampled feature map

class mmcv.ops.CornerPool(mode)[source]¶

Corner Pooling.

Corner Pooling is a new type of pooling layer that helps a convolutional network better localize corners of bounding boxes.

Please refer to https://arxiv.org/abs/1808.01244 for more details. Code is modified from https://github.com/princeton-vl/CornerNet-Lite.

Parameters:

mode (str) –

Pooling orientation for the pooling layer

’bottom’: Bottom Pooling
’left’: Left Pooling
’right’: Right Pooling
’top’: Top Pooling

Returns: Feature map after pooling.

class mmcv.ops.DeformConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deform_groups=1, bias=False)[source]¶

class mmcv.ops.DeformConv2dPack(*args, **kwargs)[source]¶

A Deformable Conv Encapsulation that acts as normal Conv layers.

The offset tensor is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:

(x0, y0) (x1, y1) (x2, y2)
(x3, y3) (x4, y4) (x5, y5)
(x6, y6) (x7, y7) (x8, y8)

Parameters:

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d.
padding (int or tuple[int]) – Same as nn.Conv2d.
dilation (int or tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

class mmcv.ops.DeformRoIPool(output_size, spatial_scale=1.0, sampling_ratio=0, gamma=0.1)[source]¶

class mmcv.ops.DeformRoIPoolPack(output_size, output_channels, deform_fc_channels=1024, spatial_scale=1.0, sampling_ratio=0, gamma=0.1)[source]¶

class mmcv.ops.ModulatedDeformRoIPoolPack(output_size, output_channels, deform_fc_channels=1024, spatial_scale=1.0, sampling_ratio=0, gamma=0.1)[source]¶

class mmcv.ops.SigmoidFocalLoss(gamma, alpha, weight=None, reduction='mean')[source]¶

class mmcv.ops.SoftmaxFocalLoss(gamma, alpha, weight=None, reduction='mean')[source]¶

class mmcv.ops.MaskedConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)[source]¶

A MaskedConv2d which inherits the official Conv2d.

The masked forward doesn’t implement the backward function and only supports the stride parameter to be 1 currently.

class mmcv.ops.ModulatedDeformConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deform_groups=1, bias=True)[source]¶

class mmcv.ops.ModulatedDeformConv2dPack(*args, **kwargs)[source]¶

A ModulatedDeformable Conv Encapsulation that acts as normal Conv layers.

Parameters:

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int) – Same as nn.Conv2d, while tuple is not supported.
padding (int) – Same as nn.Conv2d, while tuple is not supported.
dilation (int) – Same as nn.Conv2d, while tuple is not supported.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

mmcv.ops.batched_nms(boxes, scores, idxs, nms_cfg, class_agnostic=False)[source]¶

Performs non-maximum suppression in a batched fashion.

Modified from https://github.com/pytorch/vision/blob /505cd6957711af790211896d32b40291bea1bc21/torchvision/ops/boxes.py#L39. In order to perform NMS independently per class, we add an offset to all the boxes. The offset is dependent only on the class idx, and is large enough so that boxes from different classes do not overlap.

Parameters:	boxes (torch.Tensor) – boxes in shape (N, 4). scores (torch.Tensor) – scores in shape (N, ). idxs (torch.Tensor) – each index value correspond to a bbox cluster, and NMS will not be applied between elements of different idxs, shape (N, ). nms_cfg (dict) – specify nms type and other parameters like iou_thr. class_agnostic (bool) – if true, nms is class agnostic, i.e. IoU thresholding happens over all boxes, regardless of the predicted class
Returns:	kept dets and indice.
Return type:	tuple

mmcv.ops.nms(boxes, scores, iou_threshold, offset=0)[source]¶

Dispatch to either CPU or GPU NMS implementations.

The input can be either torch tensor or numpy array. GPU NMS will be used if the input is gpu tensor, otherwise CPU NMS will be used. The returned type will always be the same as inputs.

Parameters:	boxes (torch.Tensor or np.ndarray) – boxes in shape (N, 4). scores (torch.Tensor or np.ndarray) – scores in shape (N, ). iou_threshold (float) – IoU threshold for NMS. offset (int, 0 or 1) – boxes’ width or height is (x2 - x1 + offset).
Returns:	kept dets(boxes and scores) and indice, which is always the same data type as the input.
Return type:	tuple

Example

>>> boxes = np.array([[49.1, 32.4, 51.0, 35.9],
>>>                   [49.3, 32.9, 51.0, 35.3],
>>>                   [49.2, 31.8, 51.0, 35.4],
>>>                   [35.1, 11.5, 39.1, 15.7],
>>>                   [35.6, 11.8, 39.3, 14.2],
>>>                   [35.3, 11.5, 39.9, 14.5],
>>>                   [35.2, 11.7, 39.7, 15.7]], dtype=np.float32)
>>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.5, 0.4, 0.3],               dtype=np.float32)
>>> iou_threshold = 0.6
>>> dets, inds = nms(boxes, scores, iou_threshold)
>>> assert len(inds) == len(dets) == 3

mmcv.ops.soft_nms(boxes, scores, iou_threshold=0.3, sigma=0.5, min_score=0.001, method='linear', offset=0)[source]¶

Dispatch to only CPU Soft NMS implementations.

The input can be either a torch tensor or numpy array. The returned type will always be the same as inputs.

Parameters:	boxes (torch.Tensor or np.ndarray) – boxes in shape (N, 4). scores (torch.Tensor or np.ndarray) – scores in shape (N, ). iou_threshold (float) – IoU threshold for NMS. sigma (float) – hyperparameter for gaussian method min_score (float) – score filter threshold method (str) – either ‘linear’ or ‘gaussian’ offset (int, 0 or 1) – boxes’ width or height is (x2 - x1 + offset).
Returns:	kept dets(boxes and scores) and indice, which is always the same data type as the input.
Return type:	tuple

Example

>>> boxes = np.array([[4., 3., 5., 3.],
>>>                   [4., 3., 5., 4.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.]], dtype=np.float32)
>>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.4, 0.0], dtype=np.float32)
>>> iou_threshold = 0.6
>>> dets, inds = soft_nms(boxes, scores, iou_threshold, sigma=0.5)
>>> assert len(inds) == len(dets) == 5

mmcv.ops.nms_match(dets, iou_threshold)[source]¶

Matched dets into different groups by NMS.

NMS match is Similar to NMS but when a bbox is suppressed, nms match will record the indice of suppressed bbox and form a group with the indice of kept bbox. In each group, indice is sorted as score order.

Parameters:

dets (torch.Tensor | np.ndarray) – Det boxes with scores, shape (N, 5).
iou_thr (float) – IoU thresh for NMS.

Returns:

The outer list corresponds different: matched group, the inner Tensor corresponds the indices for a group in score order.

Return type:

List[torch.Tensor | np.ndarray]

class mmcv.ops.RoIAlign(output_size, spatial_scale=1.0, sampling_ratio=0, pool_mode='avg', aligned=True, use_torchvision=False)[source]¶

RoI align pooling layer.

Parameters:

output_size (tuple) – h, w
spatial_scale (float) – scale the input boxes by this number
sampling_ratio (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.
pool_mode (str, 'avg' or 'max') – pooling mode in each bin.
aligned (bool) – if False, use the legacy implementation in MMDetection. If True, align the results more perfectly.
use_torchvision (bool) – whether to use roi_align from torchvision.

Note

The implementation of RoIAlign when aligned=True is modified from https://github.com/facebookresearch/detectron2/

The meaning of aligned=True:

Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5). But the original roi_align (aligned=False) does not subtract the 0.5 when computing neighboring pixel indices and therefore it uses pixels with a slightly incorrect alignment (relative to our pixel model) when performing bilinear interpolation.

With aligned=True, we first appropriately scale the ROI and then shift it by -0.5 prior to calling roi_align. This produces the correct neighbors;

The difference does not make a difference to the model’s performance if ROIAlign is used together with conv layers.

forward(input, rois)[source]¶

Parameters:	input – NCHW images rois – Bx5 boxes. First column is the index into N. The other 4 columns are xyxy.

class mmcv.ops.RoIPool(output_size, spatial_scale=1.0)[source]¶

class mmcv.ops.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, group=None)[source]¶

class mmcv.ops.Conv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros')[source]¶

class mmcv.ops.ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, output_padding: Union[int, Tuple[int, int]] = 0, groups: int = 1, bias: bool = True, dilation: int = 1, padding_mode: str = 'zeros')[source]¶

class mmcv.ops.Linear(in_features: int, out_features: int, bias: bool = True)[source]¶

class mmcv.ops.MaxPool2d(kernel_size: Union[int, Tuple[int, ...]], stride: Union[int, Tuple[int, ...], None] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)[source]¶

class mmcv.ops.CrissCrossAttention(in_channels)[source]¶: Criss-Cross Attention Module.

class mmcv.ops.PSAMask(psa_type, mask_size=None)[source]¶

mmcv.ops.point_sample(input, points, align_corners=False, **kwargs)[source]¶

A wrapper around grid_sample() to support 3D point_coords tensors Unlike torch.nn.functional.grid_sample() it assumes point_coords to lie inside [0, 1] x [0, 1] square.

Parameters:

input (Tensor) – Feature map, shape (N, C, H, W).
points (Tensor) – Image based absolute point coordinates (normalized), range [0, 1] x [0, 1], shape (N, P, 2) or (N, Hgrid, Wgrid, 2).
align_corners (bool) – Whether align_corners. Default: False

Returns:

Features of point on input, shape (N, C, P) or: (N, C, Hgrid, Wgrid).

Return type:

Tensor

mmcv.ops.rel_roi_point_to_rel_img_point(rois, rel_roi_points, img_shape, spatial_scale=1.0)[source]¶

Convert roi based relative point coordinates to image based absolute point coordinates.

Parameters:

rois (Tensor) – RoIs or BBoxes, shape (N, 4) or (N, 5)
rel_roi_points (Tensor) – Point coordinates inside RoI, relative to RoI, location, range (0, 1), shape (N, P, 2)
img_shape (tuple) – (height, width) of image or feature map.
spatial_scale (float) – Scale points by this factor. Default: 1.

Returns:

Image based relative point coordinates for sampling,: shape (N, P, 2)

Return type:

Tensor

class mmcv.ops.SimpleRoIAlign(output_size, spatial_scale, aligned=True)[source]¶

class mmcv.ops.SAConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, use_deform=False)[source]¶

SAC (Switchable Atrous Convolution)

This is an implementation of SAC in DetectoRS (https://arxiv.org/pdf/2006.02334.pdf).

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
padding_mode (string, optional) – 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros'
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
use_deform – If True, replace convolution with deformable convolution. Default: False.