Parallel Histograms¶

class parallel_statistics.ParallelHistogram(edges)[source]¶

ParallelHistogram is a parallel and incremental calculator histograms. “Incremental” means that it does not need to read the entire data set at once, and requires only a single pass through the data.

The usual life-cycle of this class is:

create an instance of the class (on each process if in parallel)
repeatedly call add_data or add_datum on it to add new data points
call collect, (supplying in MPI communicator if in parallel)

You can also call the run method with an iterator to combine these.

Since histograms are usually relatively small, sparse arrays are not enabled for this class.

Bin edges must be pre-defined and values outside them will be ignored.

Methods

`add_data`(data[, weights])	Add a chunk of data to the histogram.
`collect`([comm])	Finalize and collect together histogram values
`run`(iterator[, comm])	Run the whole life cycle on an iterator returning data chunks.

add_data(data, weights=None)[source]¶

Add a chunk of data to the histogram.

Parameters

data: sequence: Values to be histogrammed
weights: sequence, optional: Weights per value.

collect(comm=None)[source]¶

Finalize and collect together histogram values

Parameters

comm: MPI comm or None: The comm, or None for serial

Returns

counts: array: Total counts/weights per bin

run(iterator, comm=None)[source]¶

Run the whole life cycle on an iterator returning data chunks.

This is equivalent to calling add_data repeatedly and then collect.

Parameters

iterator: iterator: Iterator yieding values or (values, weights) pairs
comm: MPI comm or None: The comm, or None for serial

Returns

counts: array: Total counts/weights per bin