Subclassing ndarray — NumPy v2.0 Manual (2024)

Introduction#

Subclassing ndarray is relatively simple, but it has some complicationscompared to other Python objects. On this page we explain the machinerythat allows you to subclass ndarray, and the implications forimplementing a subclass.

ndarrays and object creation#

Subclassing ndarray is complicated by the fact that new instances ofndarray classes can come about in three different ways. These are:

Explicit constructor call - as in MySubClass(params). This isthe usual route to Python instance creation.
View casting - casting an existing ndarray as a given subclass
New from template - creating a new instance from a templateinstance. Examples include returning slices from a subclassed array,creating return types from ufuncs, and copying arrays. SeeCreating new from template for more details

The last two are characteristics of ndarrays - in order to supportthings like array slicing. The complications of subclassing ndarray aredue to the mechanisms numpy has to support these latter two routes ofinstance creation.

When to use subclassing#

Besides the additional complexities of subclassing a NumPy array, subclassescan run into unexpected behaviour because some functions may convert thesubclass to a baseclass and “forget” any additional informationassociated with the subclass.This can result in surprising behavior if you use NumPy methods orfunctions you have not explicitly tested.

On the other hand, compared to other interoperability approaches,subclassing can be a useful because many thing will “just work”.

This means that subclassing can be a convenient approach and for a long timeit was also often the only available approach.However, NumPy now provides additional interoperability protocols describedin “Interoperability with NumPy”.For many use-cases these interoperability protocols may now be a better fitor supplement the use of subclassing.

Subclassing can be a good fit if:

you are less worried about maintainability or users other than yourself:Subclass will be faster to implement and additional interoperabilitycan be added “as-needed”. And with few users, possible surprises are notan issue.
you do not think it is problematic if the subclass information isignored or lost silently. An example is np.memmap where “forgetting”about data being memory mapped cannot lead to a wrong result.An example of a subclass that sometimes confuses users are NumPy’s maskedarrays. When they were introduced, subclassing was the only approach forimplementation. However, today we would possibly try to avoid subclassingand rely only on interoperability protocols.

Note that also subclass authors may wish to studyInteroperability with NumPyto support more complex use-cases or work around the surprising behavior.

astropy.units.Quantity and xarray are examples for array-like objectsthat interoperate well with NumPy. Astropy’s Quantity is an examplewhich uses a dual approach of both subclassing and interoperability protocols.

View casting#

View casting is the standard ndarray mechanism by which you take anndarray of any subclass, and return a view of the array as another(specified) subclass:

>>> import numpy as np>>> # create a completely useless ndarray subclass>>> class C(np.ndarray): pass>>> # create a standard ndarray>>> arr = np.zeros((3,))>>> # take a view of it, as our useless subclass>>> c_arr = arr.view(C)>>> type(c_arr)<class '__main__.C'>

Creating new from template#

New instances of an ndarray subclass can also come about by a verysimilar mechanism to View casting, when numpy finds it needs tocreate a new instance from a template instance. The most obvious placethis has to happen is when you are taking slices of subclassed arrays.For example:

>>> v = c_arr[1:]>>> type(v) # the view is of type 'C'<class '__main__.C'>>>> v is c_arr # but it's a new instanceFalse

The slice is a view onto the original c_arr data. So, when wetake a view from the ndarray, we return a new ndarray, of the sameclass, that points to the data in the original.

There are other points in the use of ndarrays where we need such views,such as copying arrays (c_arr.copy()), creating ufunc output arrays(see also __array_wrap__ for ufuncs and other functions), and reducing methods (likec_arr.mean()).

Relationship of view casting and new-from-template#

These paths both use the same machinery. We make the distinction here,because they result in different input to your methods. Specifically,View casting means you have created a new instance of your arraytype from any potential subclass of ndarray. Creating new from templatemeans you have created a new instance of your class from a pre-existinginstance, allowing you - for example - to copy across attributes thatare particular to your subclass.

Implications for subclassing#

If we subclass ndarray, we need to deal not only with explicitconstruction of our array type, but also View casting orCreating new from template. NumPy has the machinery to do this, and it isthis machinery that makes subclassing slightly non-standard.

There are two aspects to the machinery that ndarray uses to supportviews and new-from-template in subclasses.

The first is the use of the ndarray.__new__ method for the main workof object initialization, rather then the more usual __init__method. The second is the use of the __array_finalize__ method toallow subclasses to clean up after the creation of views and newinstances from templates.

A brief Python primer on `new` and `init`#

__new__ is a standard Python method, and, if present, is calledbefore __init__ when we create a class instance. See the python__new__ documentation for more detail.

For example, consider the following Python code:

>>> class C:>>>  def __new__(cls, *args):>>>  print('Cls in __new__:', cls)>>>  print('Args in __new__:', args)>>>  # The `object` type __new__ method takes a single argument.>>>  return object.__new__(cls)>>>  def __init__(self, *args):>>>  print('type(self) in __init__:', type(self))>>>  print('Args in __init__:', args)

meaning that we get:

>>> c = C('hello')Cls in __new__: <class 'C'>Args in __new__: ('hello',)type(self) in __init__: <class 'C'>Args in __init__: ('hello',)

When we call C('hello'), the __new__ method gets its own classas first argument, and the passed argument, which is the string'hello'. After python calls __new__, it usually (see below)calls our __init__ method, with the output of __new__ as thefirst argument (now a class instance), and the passed argumentsfollowing.

The role of `__array_finalize__`#

__array_finalize__ is the mechanism that numpy provides to allowsubclasses to handle the various ways that new instances get created.

Remember that subclass instances can come about in these three ways:

explicit constructor call (obj = MySubClass(params)). This willcall the usual sequence of MySubClass.__new__ then (if it exists)MySubClass.__init__.
View casting
Creating new from template

Our MySubClass.__new__ method only gets called in the case of theexplicit constructor call, so we can’t rely on MySubClass.__new__ orMySubClass.__init__ to deal with the view casting andnew-from-template. It turns out that MySubClass.__array_finalize__does get called for all three methods of object creation, so this iswhere our object creation housekeeping usually goes.

For the explicit constructor call, our subclass will need to create anew ndarray instance of its own class. In practice this means thatwe, the authors of the code, will need to make a call tondarray.__new__(MySubClass,...), a class-hierarchy prepared call tosuper().__new__(cls, ...), or do view casting of an existing array(see below)
For view casting and new-from-template, the equivalent ofndarray.__new__(MySubClass,... is called, at the C level.

The arguments that __array_finalize__ receives differ for the threemethods of instance creation above.

The following code allows us to look at the call sequences and arguments:

import numpy as npclass C(np.ndarray): def __new__(cls, *args, **kwargs): print('In __new__ with class %s' % cls) return super().__new__(cls, *args, **kwargs) def __init__(self, *args, **kwargs): # in practice you probably will not need or want an __init__ # method for your subclass print('In __init__ with class %s' % self.__class__) def __array_finalize__(self, obj): print('In array_finalize:') print(' self type is %s' % type(self)) print(' obj type is %s' % type(obj))

Now:

>>> # Explicit constructor>>> c = C((10,))In __new__ with class <class 'C'>In array_finalize: self type is <class 'C'> obj type is <type 'NoneType'>In __init__ with class <class 'C'>>>> # View casting>>> a = np.arange(10)>>> cast_a = a.view(C)In array_finalize: self type is <class 'C'> obj type is <type 'numpy.ndarray'>>>> # Slicing (example of new-from-template)>>> cv = c[:1]In array_finalize: self type is <class 'C'> obj type is <class 'C'>

The signature of __array_finalize__ is:

def __array_finalize__(self, obj):

One sees that the super call, which goes tondarray.__new__, passes __array_finalize__ the new object, of ourown class (self) as well as the object from which the view has beentaken (obj). As you can see from the output above, the self isalways a newly created instance of our subclass, and the type of objdiffers for the three instance creation methods:

When called from the explicit constructor, obj is None
When called from view casting, obj can be an instance of anysubclass of ndarray, including our own.
When called in new-from-template, obj is another instance of ourown subclass, that we might use to update the new self instance.

Because __array_finalize__ is the only method that always sees newinstances being created, it is the sensible place to fill in instancedefaults for new object attributes, among other tasks.

This may be clearer with an example.

Simple example - adding an extra attribute to ndarray#

import numpy as npclass InfoArray(np.ndarray): def __new__(subtype, shape, dtype=float, buffer=None, offset=0, strides=None, order=None, info=None): # Create the ndarray instance of our type, given the usual # ndarray input arguments. This will call the standard # ndarray constructor, but return an object of our type. # It also triggers a call to InfoArray.__array_finalize__ obj = super().__new__(subtype, shape, dtype, buffer, offset, strides, order) # set the new 'info' attribute to the value passed obj.info = info # Finally, we must return the newly created object: return obj def __array_finalize__(self, obj): # ``self`` is a new object resulting from # ndarray.__new__(InfoArray, ...), therefore it only has # attributes that the ndarray.__new__ constructor gave it - # i.e. those of a standard ndarray. # # We could have got to the ndarray.__new__ call in 3 ways: # From an explicit constructor - e.g. InfoArray(): # obj is None # (we're in the middle of the InfoArray.__new__ # constructor, and self.info will be set when we return to # InfoArray.__new__) if obj is None: return # From view casting - e.g arr.view(InfoArray): # obj is arr # (type(obj) can be InfoArray) # From new-from-template - e.g infoarr[:3] # type(obj) is InfoArray # # Note that it is here, rather than in the __new__ method, # that we set the default value for 'info', because this # method sees all creation of default objects - with the # InfoArray.__new__ constructor, but also with # arr.view(InfoArray). self.info = getattr(obj, 'info', None) # We do not need to return anything

Using the object looks like this:

>>> obj = InfoArray(shape=(3,)) # explicit constructor>>> type(obj)<class 'InfoArray'>>>> obj.info is NoneTrue>>> obj = InfoArray(shape=(3,), info='information')>>> obj.info'information'>>> v = obj[1:] # new-from-template - here - slicing>>> type(v)<class 'InfoArray'>>>> v.info'information'>>> arr = np.arange(10)>>> cast_arr = arr.view(InfoArray) # view casting>>> type(cast_arr)<class 'InfoArray'>>>> cast_arr.info is NoneTrue

This class isn’t very useful, because it has the same constructor as thebare ndarray object, including passing in buffers and shapes and so on.We would probably prefer the constructor to be able to take an alreadyformed ndarray from the usual numpy calls to np.array and return anobject.

Slightly more realistic example - attribute added to existing array#

Here is a class that takes a standard ndarray that already exists, castsas our type, and adds an extra attribute.

import numpy as npclass RealisticInfoArray(np.ndarray): def __new__(cls, input_array, info=None): # Input array is an already formed ndarray instance # We first cast to be our class type obj = np.asarray(input_array).view(cls) # add the new attribute to the created instance obj.info = info # Finally, we must return the newly created object: return obj def __array_finalize__(self, obj): # see InfoArray.__array_finalize__ for comments if obj is None: return self.info = getattr(obj, 'info', None)

So:

>>> arr = np.arange(5)>>> obj = RealisticInfoArray(arr, info='information')>>> type(obj)<class 'RealisticInfoArray'>>>> obj.info'information'>>> v = obj[1:]>>> type(v)<class 'RealisticInfoArray'>>>> v.info'information'

`__array_ufunc__` for ufuncs#

New in version 1.13.

A subclass can override what happens when executing numpy ufuncs on it byoverriding the default ndarray.__array_ufunc__ method. This method isexecuted instead of the ufunc and should return either the result of theoperation, or NotImplemented if the operation requested is notimplemented.

The signature of __array_ufunc__ is:

def __array_ufunc__(ufunc, method, *inputs, **kwargs):

ufunc is the ufunc object that was called.
method is a string indicating how the Ufunc was called, either"__call__" to indicate it was called directly, or one of itsmethods: "reduce", "accumulate","reduceat", "outer", or "at".
inputs is a tuple of the input arguments to the ufunc
kwargs contains any optional or keyword arguments passed to thefunction. This includes any out arguments, which are alwayscontained in a tuple.

A typical implementation would convert any inputs or outputs that areinstances of one’s own class, pass everything on to a superclass usingsuper(), and finally return the results after possibleback-conversion. An example, taken from the test casetest_ufunc_override_with_super in _core/tests/test_umath.py, is thefollowing.

input numpy as npclass A(np.ndarray): def __array_ufunc__(self, ufunc, method, *inputs, out=None, **kwargs): args = [] in_no = [] for i, input_ in enumerate(inputs): if isinstance(input_, A): in_no.append(i) args.append(input_.view(np.ndarray)) else: args.append(input_) outputs = out out_no = [] if outputs: out_args = [] for j, output in enumerate(outputs): if isinstance(output, A): out_no.append(j) out_args.append(output.view(np.ndarray)) else: out_args.append(output) kwargs['out'] = tuple(out_args) else: outputs = (None,) * ufunc.nout info = {} if in_no: info['inputs'] = in_no if out_no: info['outputs'] = out_no results = super().__array_ufunc__(ufunc, method, *args, **kwargs) if results is NotImplemented: return NotImplemented if method == 'at': if isinstance(inputs[0], A): inputs[0].info = info return if ufunc.nout == 1: results = (results,) results = tuple((np.asarray(result).view(A) if output is None else output) for result, output in zip(results, outputs)) if results and isinstance(results[0], A): results[0].info = info return results[0] if len(results) == 1 else results

So, this class does not actually do anything interesting: it justconverts any instances of its own to regular ndarray (otherwise, we’dget infinite recursion!), and adds an info dictionary that tellswhich inputs and outputs it converted. Hence, e.g.,

>>> a = np.arange(5.).view(A)>>> b = np.sin(a)>>> b.info{'inputs': [0]}>>> b = np.sin(np.arange(5.), out=(a,))>>> b.info{'outputs': [0]}>>> a = np.arange(5.).view(A)>>> b = np.ones(1).view(A)>>> c = a + b>>> c.info{'inputs': [0, 1]}>>> a += b>>> a.info{'inputs': [0, 1], 'outputs': [0]}

Note that another approach would be to use getattr(ufunc,methods)(*inputs, **kwargs) instead of the super call. For this example,the result would be identical, but there is a difference if another operandalso defines __array_ufunc__. E.g., lets assume that we evaluatenp.add(a, b), where b is an instance of another class B that hasan override. If you use super as in the example,ndarray.__array_ufunc__ will notice that b has an override, whichmeans it cannot evaluate the result itself. Thus, it will returnNotImplemented and so will our class A. Then, control will be passedover to b, which either knows how to deal with us and produces a result,or does not and returns NotImplemented, raising a TypeError.

If instead, we replace our super call with getattr(ufunc, method), weeffectively do np.add(a.view(np.ndarray), b). Again, B.__array_ufunc__will be called, but now it sees an ndarray as the other argument. Likely,it will know how to handle this, and return a new instance of the B classto us. Our example class is not set up to handle this, but it might well bethe best approach if, e.g., one were to re-implement MaskedArray using__array_ufunc__.

As a final note: if the super route is suited to a given class, anadvantage of using it is that it helps in constructing class hierarchies.E.g., suppose that our other class B also used the super in its__array_ufunc__ implementation, and we created a class C that dependedon both, i.e., class C(A, B) (with, for simplicity, not another__array_ufunc__ override). Then any ufunc on an instance of C wouldpass on to A.__array_ufunc__, the super call in A would go toB.__array_ufunc__, and the super call in B would go tondarray.__array_ufunc__, thus allowing A and B to collaborate.

`__array_wrap__` for ufuncs and other functions#

Prior to numpy 1.13, the behaviour of ufuncs could only be tuned using__array_wrap__ and __array_prepare__ (the latter is now removed).These two allowed one to change the output type of a ufunc, but, in contrast to__array_ufunc__, did not allow one to make any changes to the inputs.It is hoped to eventually deprecate these, but __array_wrap__ is alsoused by other numpy functions and methods, such as squeeze, so at thepresent time is still needed for full functionality.

Conceptually, __array_wrap__ “wraps up the action” in the sense ofallowing a subclass to set the type of the return value and updateattributes and metadata. Let’s show how this works with an example. Firstwe return to the simpler example subclass, but with a different name andsome print statements:

import numpy as npclass MySubClass(np.ndarray): def __new__(cls, input_array, info=None): obj = np.asarray(input_array).view(cls) obj.info = info return obj def __array_finalize__(self, obj): print('In __array_finalize__:') print(' self is %s' % repr(self)) print(' obj is %s' % repr(obj)) if obj is None: return self.info = getattr(obj, 'info', None) def __array_wrap__(self, out_arr, context=None, return_scalar=False): print('In __array_wrap__:') print(' self is %s' % repr(self)) print(' arr is %s' % repr(out_arr)) # then just call the parent return super().__array_wrap__(self, out_arr, context, return_scalar)

We run a ufunc on an instance of our new array:

>>> obj = MySubClass(np.arange(5), info='spam')In __array_finalize__: self is MySubClass([0, 1, 2, 3, 4]) obj is array([0, 1, 2, 3, 4])>>> arr2 = np.arange(5)+1>>> ret = np.add(arr2, obj)In __array_wrap__: self is MySubClass([0, 1, 2, 3, 4]) arr is array([1, 3, 5, 7, 9])In __array_finalize__: self is MySubClass([1, 3, 5, 7, 9]) obj is MySubClass([0, 1, 2, 3, 4])>>> retMySubClass([1, 3, 5, 7, 9])>>> ret.info'spam'

Note that the ufunc (np.add) has called the __array_wrap__ methodwith arguments self as obj, and out_arr as the (ndarray) resultof the addition. In turn, the default __array_wrap__(ndarray.__array_wrap__) has cast the result to class MySubClass,and called __array_finalize__ - hence the copying of the infoattribute. This has all happened at the C level.

But, we could do anything we wanted:

class SillySubClass(np.ndarray): def __array_wrap__(self, arr, context=None, return_scalar=False): return 'I lost your data'

>>> arr1 = np.arange(5)>>> obj = arr1.view(SillySubClass)>>> arr2 = np.arange(5)>>> ret = np.multiply(obj, arr2)>>> ret'I lost your data'

So, by defining a specific __array_wrap__ method for our subclass,we can tweak the output from ufuncs. The __array_wrap__ methodrequires self, then an argument - which is the result of the ufuncor another NumPy function - and an optional parameter context.This parameter is passed by ufuncs as a 3-element tuple:(name of the ufunc, arguments of the ufunc, domain of the ufunc),but is not passed by other numpy functions. Though,as seen above, it is possible to do otherwise, __array_wrap__ shouldreturn an instance of its containing class. See the masked arraysubclass for an implementation.__array_wrap__ is always passed a NumPy array which may or may not bea subclass (usually of the caller).

Extra gotchas - custom `del` methods and ndarray.base#

One of the problems that ndarray solves is keeping track of memoryownership of ndarrays and their views. Consider the case where we havecreated an ndarray, arr and have taken a slice with v = arr[1:].The two objects are looking at the same memory. NumPy keeps track ofwhere the data came from for a particular array or view, with thebase attribute:

>>> # A normal ndarray, that owns its own data>>> arr = np.zeros((4,))>>> # In this case, base is None>>> arr.base is NoneTrue>>> # We take a view>>> v1 = arr[1:]>>> # base now points to the array that it derived from>>> v1.base is arrTrue>>> # Take a view of a view>>> v2 = v1[1:]>>> # base points to the original array that it was derived from>>> v2.base is arrTrue

In general, if the array owns its own memory, as for arr in thiscase, then arr.base will be None - there are some exceptions to this- see the numpy book for more details.

The base attribute is useful in being able to tell whether we havea view or the original array. This in turn can be useful if we needto know whether or not to do some specific cleanup when the subclassedarray is deleted. For example, we may only want to do the cleanup ifthe original array is deleted, but not the views. For an example ofhow this can work, have a look at the memmap class innumpy._core.

Subclassing and downstream compatibility#

When sub-classing ndarray or creating duck-types that mimic the ndarrayinterface, it is your responsibility to decide how aligned your APIs will bewith those of numpy. For convenience, many numpy functions that have a correspondingndarray method (e.g., sum, mean, take, reshape) work by checkingif the first argument to a function has a method of the same name. If it exists, themethod is called instead of coercing the arguments to a numpy array.

For example, if you want your sub-class or duck-type to be compatible withnumpy’s sum function, the method signature for this object’s sum methodshould be the following:

def sum(self, axis=None, dtype=None, out=None, keepdims=False):...

This is the exact same method signature for np.sum, so now if a user callsnp.sum on this object, numpy will call the object’s own sum method andpass in these arguments enumerated above in the signature, and no errors willbe raised because the signatures are completely compatible with each other.

If, however, you decide to deviate from this signature and do something like this:

def sum(self, axis=None, dtype=None):...

This object is no longer compatible with np.sum because if you call np.sum,it will pass in unexpected arguments out and keepdims, causing a TypeErrorto be raised.

If you wish to maintain compatibility with numpy and its subsequent versions (whichmight add new keyword arguments) but do not want to surface all of numpy’s arguments,your function’s signature should accept **kwargs. For example:

def sum(self, axis=None, dtype=None, **unused_kwargs):...

This object is now compatible with np.sum again because any extraneous arguments(i.e. keywords that are not axis or dtype) will be hidden away in the**unused_kwargs parameter.