Parallel Mean Calculation

class parallel_statistics.ParallelMean(size, sparse=False)[source]

ParallelMean is a parallel and incremental calculator for mean statistics. “Incremental” means that it does not need to read the entire data set at once, and requires only a single pass through the data.

The calculator is designed to work on data in a collection of different bins, for example a map (where the bins are pixels). The usual life-cycle of this class is:

  • create an instance of the class (on each process if in parallel)

  • repeatedly call add_data or add_datum on it to add new data points

  • call collect, (supplying in MPI communicator if in parallel)

You can also call the run method with an iterator to combine these.

If only a few indices in the data are expected to be used, the sparse option can be set to change how data is represented and returned to a sparse form which will use less memory and be faster below a certain size.

Bins which have no objects in will be given weight=0 and mean=nan.

Methods

add_data(bin, values[, weights])

Add a chunk of data in the same bin to the sum.

add_datum(bin, value[, weight])

Add a single data point to the sum.

collect([comm, mode])

Finalize the sum and return the counts and the means.

run(iterator[, comm, mode])

Run the whole life cycle on an iterator returning data chunks.

add_data(bin, values, weights=None)

Add a chunk of data in the same bin to the sum.

Parameters
bin: int

Index of bin or pixel these value apply to

values: sequence

Values for this bin to accumulate

weights: sequence

Optional, weights per value

add_datum(bin, value, weight=None)

Add a single data point to the sum.

Parameters
bin: int

Index of bin or pixel these value apply to

value: float

Value for this bin to accumulate

collect(comm=None, mode='gather')[source]

Finalize the sum and return the counts and the means.

The mode decides whether all processes receive the results or just the root.

Parameters
comm: mpi communicator or None

If in parallel, supply this

mode: str, optional

“gather” or “allgather”

Returns
count: array or SparseArray

The number of values hitting each pixel

mean: array or SparseArray

The mean of values hitting each pixel

run(iterator, comm=None, mode='gather')

Run the whole life cycle on an iterator returning data chunks.

This is equivalent to calling add_data repeatedly and then collect.

Parameters
iterator: iterator

Iterator yielding (pixel, values) pairs

comm: MPI comm or None

The comm, or None for serial

Returns
count: array or SparseArray

The number of values hitting each pixel

sum: array or SparseArray

The total of values hitting each pixel