[ start | index | login ]
start > 2010-05-13 > 1

2010-05-13 #1

Created by arte. Last edited by arte, 118 days ago. Viewed 126 times. #1
[edit] [rdf]
labels
attachments

counting lines - a correction for the cassandra test

First of all, the tests described in my previous post are correct in terms of timing and the amount actually stored in >>Cassandra. However, I made a mistake, assuming the data file contained ~42.000.000 data points. After recounting it turned out that there are empty lines between the data points, thus reducing the data count to about 21.000.000. Still cassandra crashed on the laptop after that amount of data for lack of heap space.

Using one of our cluster nodes, giving Cassandra a bit more heap space and using a hard disk that can actually hold the data files the rate for storing batches rose to about 37k per second. A single put test will also follow.

To make my tests somewhat comparable I will copy another ten days into the data file, raising the data count to ~40.000.000. This makes about ~75GB raw json data. With the addition of UUID keys as well as cassandra indexing and some data fill this will blow up the raw space Cassandra needs to an estimated amount ~140GB.

Did I tell, these numbers are just 20 days?

no comments | post comment
[subscribe to thinkberg]

    Logged in Users: (0)
    … and 2 Guests.
    14 users and 278 docs.
    Emerged 6 years and 262 days ago

    Connections:
    >>Stephans Blog
    >>USA Erklärt
    >>DUHBLOG
    >>Der König
    >>drrockit.com
    >>sofa. rites de passage
    >>langreiter.com
    >>henso.com

    Current Gaming:
    New Super Mario Bros. Dr. Kawashima's Brain Training

    Ohloh profile for Matthias L. Jugel

    < September 2010 >
    SunMonTueWedThuFriSat
    1234
    567891011
    12131415161718
    19202122232425
    2627282930

    Portlet 1
    thinkberg
    subconscious opinions
    Copyright © 2005-2008 Matthias L. Jugel | SnipSnap 1.0b3-uttoxeter