User Tools

Site Tools

Navigation Menu



Hot Projects


SEEDStack - Open 3D printable seed/sprouting systemDIY Food Hacking


UCSSPM - Unified Clear-Sky Solar Prediction ModelOpen Solar Power


picoReflow - DIY PID Reflow Oven Controller based on RaspberryPiDIY Reflow Soldering


PiGI - DIY Geiger Counter based on RaspberryPiRasPi Geiger Counter

DIY ARA-2000

Active Wideband Receiver Antenna for SDR - ARA-2000Wideband Antenna


DSPace - Map everythingMap everything!



Elasticsearch Garbage Collector

If you're using Elasticsearch you'll sooner or later have to deal with diskspace issues. The setup I currently have to manage gathers 200 to 300 million docs per 24 hours and a solution was needed to always guarantee enough free diskspace so that Elasticsearch wouldn't fail.

The following bash script is the last incarnation of this quest to have an automatic “ringbuffer”:



# Check if Lockfile exists

  if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
    echo "EGC process already running"
    exit 1

# Make sure the Lockfile is removed 
# when we exit and then claim it

  trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT

# Create Lockfile

  echo $$ > ${LOCKFILE}

# Always keep a minimum of 30GB free in logdata
# by sacrificing oldest index (ringbuffer)

  DF=$(/bin/df /dev/md0 | sed '1d' | awk '{print $4}')

  if [ ${DF} -le 30000000 ]; then
    INDEX=$(/bin/ls -1td /logdata/dntx-es/nodes/0/indices/logstash-* | tail -1 | xargs -n 1 basename)
    curl -XDELETE "http://localhost:9200/${INDEX}"

# Check & clean elasticsearch logs 
# if disk usage is > 10GB

  DU=$(/usr/bin/du /var/log/elasticsearch/ | awk '{print $1}')

  if [ ${DU} -ge 10000000 ]; then
    rm /var/log/elasticsearch/elasticsearch.log.20*

# Remove Lockfile

  rm -f ${LOCKFILE}

  exit 0

Make sure to check/modify the script to reflect your particular setup: It's very likely that your paths and device names are different.

It runs every 10 minutes (as a cron job) and checks the available space on the device where Elasticsearch stores its indices. In this example /dev/md0 is mounted on /logdata. If md0 has less than 30GB of free diskspace it automagically finds the oldest Elasticsearch index and drops it via Elasticsearch's REST API without service interruption (no stop/restart of Elasticsearch required).

A simple locking mechanism will prevent multiple running instances in case of timing issues. All you need is curl for it to work and it will increase your storage efficiency so that you can always have as much past data available as your storage allows without the risk of full disk issues or the hassle of manual monitoring & maintaining.


Really good post

chrono, 2014/02/20 15:06

The script has been updated to only find logstash-* indexes, otherwise it could tamper with river or marvel/kibana indexes.

Enter your comment. Wiki syntax is allowed:
 ______  ____   ___    ___   ____ 
/_  __/ /_  /  / _ \  / _ \ / __ \
 / /     / /_ / ___/ / ___// /_/ /
/_/     /___//_/    /_/    \____/