counting lines - a correction for the cassandra test 
First of all, the tests described in my previous post are correct in terms of timing and the amount actually stored in
Cassandra. However, I made a mistake, assuming the data file contained ~42.000.000 data points. After recounting it turned out that there are empty lines between the data points, thus reducing the data count to about 21.000.000. Still cassandra crashed on the laptop after that amount of data for lack of heap space.
Using one of our cluster nodes, giving Cassandra a bit more heap space and using a hard disk that can actually hold the data files the rate for storing batches rose to about 37k per second. A single put test will also follow.
To make my tests somewhat comparable I will copy another ten days into the data file, raising the data count to ~40.000.000. This makes about ~75GB raw json data. With the addition of UUID keys as well as cassandra indexing and some data fill this will blow up the raw space Cassandra needs to an estimated amount ~140GB.
Did I tell, these numbers are just 20 days?