Working with datasets¶
Dataset creation¶
Todo
There will be something like a create_dataset()
method for
group_t
. However, the signature is not entirely determined
yet.
The only thing we have right now is some pseudocode
dataset_t dset = group.create_dataset(...);
Extending the size of a dataset¶
Provided that the dataset has a chunked layout one can alter its size.
Todo
Not determined how the interface should look like. Maybe we should try to stick to the API names as close as possible in order to avoid confusion for people how know the C-API.
Reading and writing data¶
To read and write an entire dataset use the read()
and
write()
methods (see also Dataset creation).
dataset_t dset = group["temperatures"];
std::vector<double> temperatures;
dset.read(temperatures);
//or for writing
dset.write(temperatures)
Partial IO¶
In many cases the the total amount of data stored in a single dataset might be to large to fit in your machines memory. In such situations we have to read the data in smaller pieces. This can be done using partial IO.
dataset_t dset = group["images"];
dims_t cdims = dset.current_dimensions();
std::vector<uint16> frame(cdims[1]*cdims[2]);
index_range_t rx(0,cdims[1]), ry(0,cdims[2]);
for(size_t frame_index=0;frame_index<cdims[0];frame_index++)
{
//the dataset_t::operator()(....) uses a variadic template to
//gather all the indexes and returns an instance of hyperslab_t
//which by itself provides the read and write methods to
//read and write data to the selection determined by the hyperslab.
dset(frame_index,r1,r2).read(frame);
//do something with the frame
}
Dataset container adapter¶
To simplify the above concept of looping along a particular dimension of
a dataset one could use the container_adapter
class container_adapter_t
{
public:
using const_iterator = ...;
hyperslab_t operator[](size_t index) const;
hyperslab_t at(size_t index) const;
size_t size() const;
template<
typename T,
typename std::enable_if<!is_container<T>::value,int>::type=0
>
void push_back(const T &value);
template<
typename T,
typename = std::enable_if_t<is_container<T>::value>
>
void push_back(const T &value);
const_iterator begin() const;
const_iterator end() const;
};
The container adapter for datasets provides an STL compliant container interface for a multidimensional dataset along one dimension.
Reading data from
using frame_t = std::vector<uint16_t>;
h5::dataset_t d = group["detector_data"];
//container adapter for dataset d along the first dimension
container_adapter_t frames(d,0);
frame_t frame;
for(auto slab: frames)
{
//process the frame
slab.read(frame);
}
Appending data¶
The container_adapter
template also provides a
push_back()
method.
using frame_t = std::vector<uint32_t>;
h5::dataset_t d = group["detector_data"];
container_adapter_t frames(d,0);
frame_t data;
while(measurement_running)
{
data = .....; //read some data
frames.push_back(data); //store data in the dataset
}
Stream IO¶
It would be nice to have something like IO streams for datasets.
h5::dataset_t dataset = group["temperatures"];
//create a new stream along the first dimension of a dataset
h5::dataset_stream_t stream(dataset,0);
while(measurement_running)
{
double temperature = read_temperature();
stream<<temperature;
}
Or the other way around for reading
h5::dataset_t datset = group["temperature"];
h5::dataset_stream_t stream(dataset,0);
double temperature = 0.0;
while(!stream.eof())
{
stream>>temperature;
}
A possible implementation could be done based upon the
container_adapter_t
class dataset_stream_t
{
private:
container_adapter _adapter;
size_t _position;
public:
dataset_stream_t();
};