[ start | index | login ]
start > 2010-05-13 > 1

2010-05-13 #1

Created by arte. Last edited by arte, one year and 271 days ago. Viewed 1,378 times. #1
[edit] [rdf]
labels
attachments

counting lines - a correction for the cassandra test

First of all, the tests described in my previous post are correct in terms of timing and the amount actually stored in >>Cassandra. However, I made a mistake, assuming the data file contained ~42.000.000 data points. After recounting it turned out that there are empty lines between the data points, thus reducing the data count to about 21.000.000. Still cassandra crashed on the laptop after that amount of data for lack of heap space.

Using one of our cluster nodes, giving Cassandra a bit more heap space and using a hard disk that can actually hold the data files the rate for storing batches rose to about 37k per second. A single put test will also follow.

To make my tests somewhat comparable I will copy another ten days into the data file, raising the data count to ~40.000.000. This makes about ~75GB raw json data. With the addition of UUID keys as well as cassandra indexing and some data fill this will blow up the raw space Cassandra needs to an estimated amount ~140GB.

Did I tell, these numbers are just 20 days?

Please login to post a comment.
[subscribe to thinkberg]

    Logged in Users: (0)
    … and a Guest.
    14 users and 287 docs.
    Emerged 8 years and 51 days ago

    Connections:
    >>WSDHA
    >>Stephans Blog
    >>USA Erklärt
    >>sofa. rites de passage
    >>langreiter.com
    >>henso.com

    Current Gaming:
    New Super Mario Bros. Dr. Kawashima's Brain Training

    Ohloh profile for Matthias L. Jugel

    < February 2012 >
    SunMonTueWedThuFriSat
    1234
    567891011
    12131415161718
    19202122232425
    26272829

    Portlet 1
    thinkberg
    subconscious opinions
    Copyright © 2005-2008 Matthias L. Jugel | SnipSnap 1.0b3-uttoxeter