Parallel Histograms

class parallel_statistics.ParallelHistogram(edges)[source]

ParallelHistogram is a parallel and incremental calculator histograms. “Incremental” means that it does not need to read the entire data set at once, and requires only a single pass through the data.

The usual life-cycle of this class is:

  • create an instance of the class (on each process if in parallel)

  • repeatedly call add_data or add_datum on it to add new data points

  • call collect, (supplying in MPI communicator if in parallel)

You can also call the run method with an iterator to combine these.

Since histograms are usually relatively small, sparse arrays are not enabled for this class.

Bin edges must be pre-defined and values outside them will be ignored.

Methods

add_data(data[, weights])

Add a chunk of data to the histogram.

collect([comm])

Finalize and collect together histogram values

run(iterator[, comm])

Run the whole life cycle on an iterator returning data chunks.

add_data(data, weights=None)[source]

Add a chunk of data to the histogram.

Parameters
data: sequence

Values to be histogrammed

weights: sequence, optional

Weights per value.

collect(comm=None)[source]

Finalize and collect together histogram values

Parameters
comm: MPI comm or None

The comm, or None for serial

Returns
counts: array

Total counts/weights per bin

run(iterator, comm=None)[source]

Run the whole life cycle on an iterator returning data chunks.

This is equivalent to calling add_data repeatedly and then collect.

Parameters
iterator: iterator

Iterator yieding values or (values, weights) pairs

comm: MPI comm or None

The comm, or None for serial

Returns
counts: array

Total counts/weights per bin