User Tools

Site Tools


Navigation Menu

Flight-Control

  • StatusClosed
  • OP-ModePre-Launch
  • LocationN48 - E11
  • Localtime14:04
  • CountdownT-00D 00:00

Hot Projects

SEEDStack

SEEDStack - Open 3D printable seed/sprouting systemDIY Food Hacking

UCSSPM

UCSSPM - Unified Clear-Sky Solar Prediction ModelOpen Solar Power

picoReflow

picoReflow - DIY PID Reflow Oven Controller based on RaspberryPiDIY Reflow Soldering

PiGI

PiGI - DIY Geiger Counter based on RaspberryPiRasPi Geiger Counter

DIY ARA-2000

Active Wideband Receiver Antenna for SDR - ARA-2000Wideband Antenna

DSpace

DSPace - Map everythingMap everything!

Mission-Tags

Elasticsearch Garbage Collector

If you're using Elasticsearch you'll sooner or later have to deal with diskspace issues. The setup I currently have to manage gathers 200 to 300 million docs per 24 hours and a solution was needed to always guarantee enough free diskspace so that Elasticsearch wouldn't fail.

The following bash script is the last incarnation of this quest to have an automatic “ringbuffer”:

#!/bin/bash

  LOCKFILE=/var/run/egc.lock

# Check if Lockfile exists

  if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
    echo "EGC process already running"
    exit 1
  fi

# Make sure the Lockfile is removed 
# when we exit and then claim it

  trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT

# Create Lockfile

  echo $$ > ${LOCKFILE}

# Always keep a minimum of 30GB free in logdata
# by sacrificing oldest index (ringbuffer)

  DF=$(/bin/df /dev/md0 | sed '1d' | awk '{print $4}')

  if [ ${DF} -le 30000000 ]; then
    INDEX=$(/bin/ls -1td /logdata/dntx-es/nodes/0/indices/logstash-* | tail -1 | xargs -n 1 basename)
    curl -XDELETE "http://localhost:9200/${INDEX}"
  fi

# Check & clean elasticsearch logs 
# if disk usage is > 10GB

  DU=$(/usr/bin/du /var/log/elasticsearch/ | awk '{print $1}')

  if [ ${DU} -ge 10000000 ]; then
    rm /var/log/elasticsearch/elasticsearch.log.20*
  fi

# Remove Lockfile

  rm -f ${LOCKFILE}

  exit 0

Make sure to check/modify the script to reflect your particular setup: It's very likely that your paths and device names are different.

It runs every 10 minutes (as a cron job) and checks the available space on the device where Elasticsearch stores its indices. In this example /dev/md0 is mounted on /logdata. If md0 has less than 30GB of free diskspace it automagically finds the oldest Elasticsearch index and drops it via Elasticsearch's REST API without service interruption (no stop/restart of Elasticsearch required).

A simple locking mechanism will prevent multiple running instances in case of timing issues. All you need is curl for it to work and it will increase your storage efficiency so that you can always have as much past data available as your storage allows without the risk of full disk issues or the hassle of manual monitoring & maintaining.

Mission-Log entry created by chrono on 2013/07/17 11:51 UTC

Discussion

Dennis
2013/12/04 09:22

Really good post

chrono
2014/02/20 15:06

The script has been updated to only find logstash-* indexes, otherwise it could tamper with river or marvel/kibana indexes.

Comment text

Please type the letters you see on the left into the box on the right:

   _  __   ___  __  __   ___  __  __
  / |/ /  / _ \ \ \/ /  / _ \ \ \/ /
 /    /  / // /  \  /  / ___/  \  / 
/_/|_/  /____/   /_/  /_/      /_/