subclassing.py 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560
  1. """
  2. =============================
  3. Subclassing ndarray in python
  4. =============================
  5. Credits
  6. -------
  7. This page is based with thanks on the wiki page on subclassing by Pierre
  8. Gerard-Marchant - http://www.scipy.org/Subclasses.
  9. Introduction
  10. ------------
  11. Subclassing ndarray is relatively simple, but it has some complications
  12. compared to other Python objects. On this page we explain the machinery
  13. that allows you to subclass ndarray, and the implications for
  14. implementing a subclass.
  15. ndarrays and object creation
  16. ============================
  17. Subclassing ndarray is complicated by the fact that new instances of
  18. ndarray classes can come about in three different ways. These are:
  19. #. Explicit constructor call - as in ``MySubClass(params)``. This is
  20. the usual route to Python instance creation.
  21. #. View casting - casting an existing ndarray as a given subclass
  22. #. New from template - creating a new instance from a template
  23. instance. Examples include returning slices from a subclassed array,
  24. creating return types from ufuncs, and copying arrays. See
  25. :ref:`new-from-template` for more details
  26. The last two are characteristics of ndarrays - in order to support
  27. things like array slicing. The complications of subclassing ndarray are
  28. due to the mechanisms numpy has to support these latter two routes of
  29. instance creation.
  30. .. _view-casting:
  31. View casting
  32. ------------
  33. *View casting* is the standard ndarray mechanism by which you take an
  34. ndarray of any subclass, and return a view of the array as another
  35. (specified) subclass:
  36. >>> import numpy as np
  37. >>> # create a completely useless ndarray subclass
  38. >>> class C(np.ndarray): pass
  39. >>> # create a standard ndarray
  40. >>> arr = np.zeros((3,))
  41. >>> # take a view of it, as our useless subclass
  42. >>> c_arr = arr.view(C)
  43. >>> type(c_arr)
  44. <class 'C'>
  45. .. _new-from-template:
  46. Creating new from template
  47. --------------------------
  48. New instances of an ndarray subclass can also come about by a very
  49. similar mechanism to :ref:`view-casting`, when numpy finds it needs to
  50. create a new instance from a template instance. The most obvious place
  51. this has to happen is when you are taking slices of subclassed arrays.
  52. For example:
  53. >>> v = c_arr[1:]
  54. >>> type(v) # the view is of type 'C'
  55. <class 'C'>
  56. >>> v is c_arr # but it's a new instance
  57. False
  58. The slice is a *view* onto the original ``c_arr`` data. So, when we
  59. take a view from the ndarray, we return a new ndarray, of the same
  60. class, that points to the data in the original.
  61. There are other points in the use of ndarrays where we need such views,
  62. such as copying arrays (``c_arr.copy()``), creating ufunc output arrays
  63. (see also :ref:`array-wrap`), and reducing methods (like
  64. ``c_arr.mean()``.
  65. Relationship of view casting and new-from-template
  66. --------------------------------------------------
  67. These paths both use the same machinery. We make the distinction here,
  68. because they result in different input to your methods. Specifically,
  69. :ref:`view-casting` means you have created a new instance of your array
  70. type from any potential subclass of ndarray. :ref:`new-from-template`
  71. means you have created a new instance of your class from a pre-existing
  72. instance, allowing you - for example - to copy across attributes that
  73. are particular to your subclass.
  74. Implications for subclassing
  75. ----------------------------
  76. If we subclass ndarray, we need to deal not only with explicit
  77. construction of our array type, but also :ref:`view-casting` or
  78. :ref:`new-from-template`. Numpy has the machinery to do this, and this
  79. machinery that makes subclassing slightly non-standard.
  80. There are two aspects to the machinery that ndarray uses to support
  81. views and new-from-template in subclasses.
  82. The first is the use of the ``ndarray.__new__`` method for the main work
  83. of object initialization, rather then the more usual ``__init__``
  84. method. The second is the use of the ``__array_finalize__`` method to
  85. allow subclasses to clean up after the creation of views and new
  86. instances from templates.
  87. A brief Python primer on ``__new__`` and ``__init__``
  88. =====================================================
  89. ``__new__`` is a standard Python method, and, if present, is called
  90. before ``__init__`` when we create a class instance. See the `python
  91. __new__ documentation
  92. <http://docs.python.org/reference/datamodel.html#object.__new__>`_ for more detail.
  93. For example, consider the following Python code:
  94. .. testcode::
  95. class C(object):
  96. def __new__(cls, *args):
  97. print('Cls in __new__:', cls)
  98. print('Args in __new__:', args)
  99. return object.__new__(cls, *args)
  100. def __init__(self, *args):
  101. print('type(self) in __init__:', type(self))
  102. print('Args in __init__:', args)
  103. meaning that we get:
  104. >>> c = C('hello')
  105. Cls in __new__: <class 'C'>
  106. Args in __new__: ('hello',)
  107. type(self) in __init__: <class 'C'>
  108. Args in __init__: ('hello',)
  109. When we call ``C('hello')``, the ``__new__`` method gets its own class
  110. as first argument, and the passed argument, which is the string
  111. ``'hello'``. After python calls ``__new__``, it usually (see below)
  112. calls our ``__init__`` method, with the output of ``__new__`` as the
  113. first argument (now a class instance), and the passed arguments
  114. following.
  115. As you can see, the object can be initialized in the ``__new__``
  116. method or the ``__init__`` method, or both, and in fact ndarray does
  117. not have an ``__init__`` method, because all the initialization is
  118. done in the ``__new__`` method.
  119. Why use ``__new__`` rather than just the usual ``__init__``? Because
  120. in some cases, as for ndarray, we want to be able to return an object
  121. of some other class. Consider the following:
  122. .. testcode::
  123. class D(C):
  124. def __new__(cls, *args):
  125. print('D cls is:', cls)
  126. print('D args in __new__:', args)
  127. return C.__new__(C, *args)
  128. def __init__(self, *args):
  129. # we never get here
  130. print('In D __init__')
  131. meaning that:
  132. >>> obj = D('hello')
  133. D cls is: <class 'D'>
  134. D args in __new__: ('hello',)
  135. Cls in __new__: <class 'C'>
  136. Args in __new__: ('hello',)
  137. >>> type(obj)
  138. <class 'C'>
  139. The definition of ``C`` is the same as before, but for ``D``, the
  140. ``__new__`` method returns an instance of class ``C`` rather than
  141. ``D``. Note that the ``__init__`` method of ``D`` does not get
  142. called. In general, when the ``__new__`` method returns an object of
  143. class other than the class in which it is defined, the ``__init__``
  144. method of that class is not called.
  145. This is how subclasses of the ndarray class are able to return views
  146. that preserve the class type. When taking a view, the standard
  147. ndarray machinery creates the new ndarray object with something
  148. like::
  149. obj = ndarray.__new__(subtype, shape, ...
  150. where ``subdtype`` is the subclass. Thus the returned view is of the
  151. same class as the subclass, rather than being of class ``ndarray``.
  152. That solves the problem of returning views of the same type, but now
  153. we have a new problem. The machinery of ndarray can set the class
  154. this way, in its standard methods for taking views, but the ndarray
  155. ``__new__`` method knows nothing of what we have done in our own
  156. ``__new__`` method in order to set attributes, and so on. (Aside -
  157. why not call ``obj = subdtype.__new__(...`` then? Because we may not
  158. have a ``__new__`` method with the same call signature).
  159. The role of ``__array_finalize__``
  160. ==================================
  161. ``__array_finalize__`` is the mechanism that numpy provides to allow
  162. subclasses to handle the various ways that new instances get created.
  163. Remember that subclass instances can come about in these three ways:
  164. #. explicit constructor call (``obj = MySubClass(params)``). This will
  165. call the usual sequence of ``MySubClass.__new__`` then (if it exists)
  166. ``MySubClass.__init__``.
  167. #. :ref:`view-casting`
  168. #. :ref:`new-from-template`
  169. Our ``MySubClass.__new__`` method only gets called in the case of the
  170. explicit constructor call, so we can't rely on ``MySubClass.__new__`` or
  171. ``MySubClass.__init__`` to deal with the view casting and
  172. new-from-template. It turns out that ``MySubClass.__array_finalize__``
  173. *does* get called for all three methods of object creation, so this is
  174. where our object creation housekeeping usually goes.
  175. * For the explicit constructor call, our subclass will need to create a
  176. new ndarray instance of its own class. In practice this means that
  177. we, the authors of the code, will need to make a call to
  178. ``ndarray.__new__(MySubClass,...)``, or do view casting of an existing
  179. array (see below)
  180. * For view casting and new-from-template, the equivalent of
  181. ``ndarray.__new__(MySubClass,...`` is called, at the C level.
  182. The arguments that ``__array_finalize__`` recieves differ for the three
  183. methods of instance creation above.
  184. The following code allows us to look at the call sequences and arguments:
  185. .. testcode::
  186. import numpy as np
  187. class C(np.ndarray):
  188. def __new__(cls, *args, **kwargs):
  189. print('In __new__ with class %s' % cls)
  190. return np.ndarray.__new__(cls, *args, **kwargs)
  191. def __init__(self, *args, **kwargs):
  192. # in practice you probably will not need or want an __init__
  193. # method for your subclass
  194. print('In __init__ with class %s' % self.__class__)
  195. def __array_finalize__(self, obj):
  196. print('In array_finalize:')
  197. print(' self type is %s' % type(self))
  198. print(' obj type is %s' % type(obj))
  199. Now:
  200. >>> # Explicit constructor
  201. >>> c = C((10,))
  202. In __new__ with class <class 'C'>
  203. In array_finalize:
  204. self type is <class 'C'>
  205. obj type is <type 'NoneType'>
  206. In __init__ with class <class 'C'>
  207. >>> # View casting
  208. >>> a = np.arange(10)
  209. >>> cast_a = a.view(C)
  210. In array_finalize:
  211. self type is <class 'C'>
  212. obj type is <type 'numpy.ndarray'>
  213. >>> # Slicing (example of new-from-template)
  214. >>> cv = c[:1]
  215. In array_finalize:
  216. self type is <class 'C'>
  217. obj type is <class 'C'>
  218. The signature of ``__array_finalize__`` is::
  219. def __array_finalize__(self, obj):
  220. ``ndarray.__new__`` passes ``__array_finalize__`` the new object, of our
  221. own class (``self``) as well as the object from which the view has been
  222. taken (``obj``). As you can see from the output above, the ``self`` is
  223. always a newly created instance of our subclass, and the type of ``obj``
  224. differs for the three instance creation methods:
  225. * When called from the explicit constructor, ``obj`` is ``None``
  226. * When called from view casting, ``obj`` can be an instance of any
  227. subclass of ndarray, including our own.
  228. * When called in new-from-template, ``obj`` is another instance of our
  229. own subclass, that we might use to update the new ``self`` instance.
  230. Because ``__array_finalize__`` is the only method that always sees new
  231. instances being created, it is the sensible place to fill in instance
  232. defaults for new object attributes, among other tasks.
  233. This may be clearer with an example.
  234. Simple example - adding an extra attribute to ndarray
  235. -----------------------------------------------------
  236. .. testcode::
  237. import numpy as np
  238. class InfoArray(np.ndarray):
  239. def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
  240. strides=None, order=None, info=None):
  241. # Create the ndarray instance of our type, given the usual
  242. # ndarray input arguments. This will call the standard
  243. # ndarray constructor, but return an object of our type.
  244. # It also triggers a call to InfoArray.__array_finalize__
  245. obj = np.ndarray.__new__(subtype, shape, dtype, buffer, offset, strides,
  246. order)
  247. # set the new 'info' attribute to the value passed
  248. obj.info = info
  249. # Finally, we must return the newly created object:
  250. return obj
  251. def __array_finalize__(self, obj):
  252. # ``self`` is a new object resulting from
  253. # ndarray.__new__(InfoArray, ...), therefore it only has
  254. # attributes that the ndarray.__new__ constructor gave it -
  255. # i.e. those of a standard ndarray.
  256. #
  257. # We could have got to the ndarray.__new__ call in 3 ways:
  258. # From an explicit constructor - e.g. InfoArray():
  259. # obj is None
  260. # (we're in the middle of the InfoArray.__new__
  261. # constructor, and self.info will be set when we return to
  262. # InfoArray.__new__)
  263. if obj is None: return
  264. # From view casting - e.g arr.view(InfoArray):
  265. # obj is arr
  266. # (type(obj) can be InfoArray)
  267. # From new-from-template - e.g infoarr[:3]
  268. # type(obj) is InfoArray
  269. #
  270. # Note that it is here, rather than in the __new__ method,
  271. # that we set the default value for 'info', because this
  272. # method sees all creation of default objects - with the
  273. # InfoArray.__new__ constructor, but also with
  274. # arr.view(InfoArray).
  275. self.info = getattr(obj, 'info', None)
  276. # We do not need to return anything
  277. Using the object looks like this:
  278. >>> obj = InfoArray(shape=(3,)) # explicit constructor
  279. >>> type(obj)
  280. <class 'InfoArray'>
  281. >>> obj.info is None
  282. True
  283. >>> obj = InfoArray(shape=(3,), info='information')
  284. >>> obj.info
  285. 'information'
  286. >>> v = obj[1:] # new-from-template - here - slicing
  287. >>> type(v)
  288. <class 'InfoArray'>
  289. >>> v.info
  290. 'information'
  291. >>> arr = np.arange(10)
  292. >>> cast_arr = arr.view(InfoArray) # view casting
  293. >>> type(cast_arr)
  294. <class 'InfoArray'>
  295. >>> cast_arr.info is None
  296. True
  297. This class isn't very useful, because it has the same constructor as the
  298. bare ndarray object, including passing in buffers and shapes and so on.
  299. We would probably prefer the constructor to be able to take an already
  300. formed ndarray from the usual numpy calls to ``np.array`` and return an
  301. object.
  302. Slightly more realistic example - attribute added to existing array
  303. -------------------------------------------------------------------
  304. Here is a class that takes a standard ndarray that already exists, casts
  305. as our type, and adds an extra attribute.
  306. .. testcode::
  307. import numpy as np
  308. class RealisticInfoArray(np.ndarray):
  309. def __new__(cls, input_array, info=None):
  310. # Input array is an already formed ndarray instance
  311. # We first cast to be our class type
  312. obj = np.asarray(input_array).view(cls)
  313. # add the new attribute to the created instance
  314. obj.info = info
  315. # Finally, we must return the newly created object:
  316. return obj
  317. def __array_finalize__(self, obj):
  318. # see InfoArray.__array_finalize__ for comments
  319. if obj is None: return
  320. self.info = getattr(obj, 'info', None)
  321. So:
  322. >>> arr = np.arange(5)
  323. >>> obj = RealisticInfoArray(arr, info='information')
  324. >>> type(obj)
  325. <class 'RealisticInfoArray'>
  326. >>> obj.info
  327. 'information'
  328. >>> v = obj[1:]
  329. >>> type(v)
  330. <class 'RealisticInfoArray'>
  331. >>> v.info
  332. 'information'
  333. .. _array-wrap:
  334. ``__array_wrap__`` for ufuncs
  335. -------------------------------------------------------
  336. ``__array_wrap__`` gets called at the end of numpy ufuncs and other numpy
  337. functions, to allow a subclass to set the type of the return value
  338. and update attributes and metadata. Let's show how this works with an example.
  339. First we make the same subclass as above, but with a different name and
  340. some print statements:
  341. .. testcode::
  342. import numpy as np
  343. class MySubClass(np.ndarray):
  344. def __new__(cls, input_array, info=None):
  345. obj = np.asarray(input_array).view(cls)
  346. obj.info = info
  347. return obj
  348. def __array_finalize__(self, obj):
  349. print('In __array_finalize__:')
  350. print(' self is %s' % repr(self))
  351. print(' obj is %s' % repr(obj))
  352. if obj is None: return
  353. self.info = getattr(obj, 'info', None)
  354. def __array_wrap__(self, out_arr, context=None):
  355. print('In __array_wrap__:')
  356. print(' self is %s' % repr(self))
  357. print(' arr is %s' % repr(out_arr))
  358. # then just call the parent
  359. return np.ndarray.__array_wrap__(self, out_arr, context)
  360. We run a ufunc on an instance of our new array:
  361. >>> obj = MySubClass(np.arange(5), info='spam')
  362. In __array_finalize__:
  363. self is MySubClass([0, 1, 2, 3, 4])
  364. obj is array([0, 1, 2, 3, 4])
  365. >>> arr2 = np.arange(5)+1
  366. >>> ret = np.add(arr2, obj)
  367. In __array_wrap__:
  368. self is MySubClass([0, 1, 2, 3, 4])
  369. arr is array([1, 3, 5, 7, 9])
  370. In __array_finalize__:
  371. self is MySubClass([1, 3, 5, 7, 9])
  372. obj is MySubClass([0, 1, 2, 3, 4])
  373. >>> ret
  374. MySubClass([1, 3, 5, 7, 9])
  375. >>> ret.info
  376. 'spam'
  377. Note that the ufunc (``np.add``) has called the ``__array_wrap__`` method of the
  378. input with the highest ``__array_priority__`` value, in this case
  379. ``MySubClass.__array_wrap__``, with arguments ``self`` as ``obj``, and
  380. ``out_arr`` as the (ndarray) result of the addition. In turn, the
  381. default ``__array_wrap__`` (``ndarray.__array_wrap__``) has cast the
  382. result to class ``MySubClass``, and called ``__array_finalize__`` -
  383. hence the copying of the ``info`` attribute. This has all happened at the C level.
  384. But, we could do anything we wanted:
  385. .. testcode::
  386. class SillySubClass(np.ndarray):
  387. def __array_wrap__(self, arr, context=None):
  388. return 'I lost your data'
  389. >>> arr1 = np.arange(5)
  390. >>> obj = arr1.view(SillySubClass)
  391. >>> arr2 = np.arange(5)
  392. >>> ret = np.multiply(obj, arr2)
  393. >>> ret
  394. 'I lost your data'
  395. So, by defining a specific ``__array_wrap__`` method for our subclass,
  396. we can tweak the output from ufuncs. The ``__array_wrap__`` method
  397. requires ``self``, then an argument - which is the result of the ufunc -
  398. and an optional parameter *context*. This parameter is returned by some
  399. ufuncs as a 3-element tuple: (name of the ufunc, argument of the ufunc,
  400. domain of the ufunc). ``__array_wrap__`` should return an instance of
  401. its containing class. See the masked array subclass for an
  402. implementation.
  403. In addition to ``__array_wrap__``, which is called on the way out of the
  404. ufunc, there is also an ``__array_prepare__`` method which is called on
  405. the way into the ufunc, after the output arrays are created but before any
  406. computation has been performed. The default implementation does nothing
  407. but pass through the array. ``__array_prepare__`` should not attempt to
  408. access the array data or resize the array, it is intended for setting the
  409. output array type, updating attributes and metadata, and performing any
  410. checks based on the input that may be desired before computation begins.
  411. Like ``__array_wrap__``, ``__array_prepare__`` must return an ndarray or
  412. subclass thereof or raise an error.
  413. Extra gotchas - custom ``__del__`` methods and ndarray.base
  414. -----------------------------------------------------------
  415. One of the problems that ndarray solves is keeping track of memory
  416. ownership of ndarrays and their views. Consider the case where we have
  417. created an ndarray, ``arr`` and have taken a slice with ``v = arr[1:]``.
  418. The two objects are looking at the same memory. Numpy keeps track of
  419. where the data came from for a particular array or view, with the
  420. ``base`` attribute:
  421. >>> # A normal ndarray, that owns its own data
  422. >>> arr = np.zeros((4,))
  423. >>> # In this case, base is None
  424. >>> arr.base is None
  425. True
  426. >>> # We take a view
  427. >>> v1 = arr[1:]
  428. >>> # base now points to the array that it derived from
  429. >>> v1.base is arr
  430. True
  431. >>> # Take a view of a view
  432. >>> v2 = v1[1:]
  433. >>> # base points to the view it derived from
  434. >>> v2.base is v1
  435. True
  436. In general, if the array owns its own memory, as for ``arr`` in this
  437. case, then ``arr.base`` will be None - there are some exceptions to this
  438. - see the numpy book for more details.
  439. The ``base`` attribute is useful in being able to tell whether we have
  440. a view or the original array. This in turn can be useful if we need
  441. to know whether or not to do some specific cleanup when the subclassed
  442. array is deleted. For example, we may only want to do the cleanup if
  443. the original array is deleted, but not the views. For an example of
  444. how this can work, have a look at the ``memmap`` class in
  445. ``numpy.core``.
  446. """
  447. from __future__ import division, absolute_import, print_function