In addition to this helpful guide, note that statsd / graphite both spring some unfortunate surprises on new users, e.g., graphite changing your data across retention rates and time scales [0], graphite changing your data at different plot widths (?!) [1], statsd believing that only count and time data deserve to be aggregated [2], etc.
I have no alternative to suggest, however. Perhaps Cube [3], but unclear if it has any user community.
Re [0]: If you never want your data downsampled, keep data at a single resolution which is equal to the flush interval used to push data to Graphite. Carbon will never "change your data" under such a configuration.
Re [1]: How would you expect the presentation layer to present >n data points using n pixels?
Graphite doesn't "change your data". Presentation of data != the data itself, just as a map of a city != the city itself.
> If you never want your data downsampled, keep data at a single resolution...
Sure, and many people do exactly that. The point is that a new user to graphite is likely to be surprised by this behavior. (I would further bet that a reasonable fraction of statsd+graphite users end up viewing incorrect data without realizing it, especially given the statsd focus on count data, for which the default aggregationMethod setting is exactly the wrong choice.)
(And even awareness of this behavior isn't quite enough, since every user needs to also remember their server's exact storage configuration, lest they inadvertently expand their plot across a retention boundary.)
> How would you expect the presentation layer to present n data points using n pixels?
The same way that most plotting tools do so: by overdrawing. Yes, one ends up with a solid block of pixels if the data are noisy and the plot is small, but that outcome is easily understood and has the easily understood solution of explicitly aggregating appropriately. Graphite instead takes the approach of implicitly aggregating based on how wide the plot is rendered in a given interface. That behavior is, at the very least, surprising.
It's not a bug; carbon was behaving exactly the way it was configured to behave. This wouldn't be surprising to anyone who is familiar with RRDTool. However, since one of the reasons graphite uses its own file format (whisper) instead of RRD is to better handle intermittent values, I could see the argument that the default xFilesFactor should be higher.
"xFilesFactor should be a floating point number between 0 and 1, and specifies what fraction of the previous retention level’s slots must have non-null values in order to aggregate to a non-null value. The default is 0.5." - http://graphite.readthedocs.org/en/1.0/config-carbon.html
Oh, yeah, it's weird but not even a serious limitation: it's being fixed, other statsd clones have more features, and you can always pretend your data are time intervals. statsd is limited but nicely simple.
One nitpick- You don't need to use statsd as an intermediary in order for your application to send metrics via UDP; just set ENABLE_UDP_LISTENER to True in carbon.conf and graphite will accept metrics on UDP itself. Other options are TCP(obviously) and AMQP.
I love how simple Graphite's plaintext protocol is; it's nothing more than a line of text with <metric path> <metric value> <metric timestamp>. This has lead lots of software to integrate graphite support and makes it easy to do yourself. In a pinch I've even set up a cronjob reading a value from /proc and sending it to graphite via netcat.
Graphite shines at generating graphs, but it's ability to return JSON is also very useful. For example, I've written a script (https://github.com/sciurus/grallect) that plugs into Nagios and generates alerts based on system metrics sent by Collectd to Graphite.
My two frustrations with graphite-
You have to choose a single aggregation method. I'd like to be able to store the average, minimum, and maximum values.
Sometimes I find it hard to query for the data I want. E.G. To check the percentage of space used on each filesystem I have to fetch example.com.df-.df_complex-used and example.com.df-.df_complex-free separately and calculate the percentages myself because asPercent(example.df-*.df_complex-{used,free}) would combine all the filesystems.
Feel free to point out useful things graphite could do better (constructively only) and/or some of your favorite posts or tools used with graphite. We aren't too far off from 2 quite massive releases (0.9.11 / 0.10) and are thinking about departing from some of the legacy bits moving forward. I'm looking at you python 2.4
The process of getting graphite web up and running on Mac seemed pretty involved since it's broken up into 3 packages and depends on Cairo which can be finicky.
First of all, thanks a lot for such an awesome tool!
I think graphite would greatly improve by having the "web" part split into different apps/packages (API, graphing and frontend/dashboard).
That way, people could install whatever they wanted. Imagine "only" having to improve the backend while other people create amazing dashboards (which, right now, is already happening anyway... there's such a fragmentation in the available frontends...)
I'd love to help with the split if you deem it worthy & if you need a hand :) I will create a GH issue anyway :D
I use a statsd compatible alternative called statsite.[1]
It's written in pure c and behaves like you would expect statsd to, with some additional improvements. I'm definitely more comfortable deploying it as opposed to installing and managing a node.js application.
Interesting, 37signals released their own Golang based version of statsd: https://github.com/noahhl/go-batsd Probably for the same reasons you rewrote it in pure C.
The main reason being that StatsD will max out at about 10K OPS (unless they've improved it recently) whereas Statsite will reach 10 MM. Also, look at the difference between the implementation of sets. StatsD uses a JS object[1] versus statsite using a C implementation of HyperLogLog[2][3]. If you're doing anything significant, you should not be using the node.js version of StatsD.
CentOS isn't right for everyone, oddly because of the Enterprise focus for stability. The system libraries on CentOS end up being quite old from the viewpoint of a lot of developers.
Sometimes the best solution is to ignore the system provided libraries and build your own environment.
I only ran into one problem (and they quickly accepted my pull request to fix it) building and using RPMs from the spec files at https://github.com/dcarley/graphite-rpms
I have no alternative to suggest, however. Perhaps Cube [3], but unclear if it has any user community.
[0] http://stackoverflow.com/questions/10820119/graphite-is-not-... [1] http://graphite.readthedocs.org/en/1.0/functions.html#graphi... [2] https://github.com/etsy/statsd/issues/98 [3] https://github.com/square/cube