User Tools

Site Tools


Navigation Menu

Flight-Control

<
Previous mounth
06/07/2023
>
Next mounth
SMTWFTS
23
04
04
05
05
06
06
07
07
080910
2411121314151617
2518192021222324
2625262728293001
2702030405060708









Hot Projects

SEEDStack

SEEDStack - Open 3D printable seed/sprouting systemDIY Food Hacking

UCSSPM

UCSSPM - Unified Clear-Sky Solar Prediction ModelOpen Solar Power

picoReflow

picoReflow - DIY PID Reflow Oven Controller based on RaspberryPiDIY Reflow Soldering

PiGI

PiGI - DIY Geiger Counter based on RaspberryPiRasPi Geiger Counter

DIY ARA-2000

Active Wideband Receiver Antenna for SDR - ARA-2000Wideband Antenna

DSpace

DSPace - Map everythingMap everything!

Mission-Tags

Elasticsearch Garbage Collector

If you're using Elasticsearch you'll sooner or later have to deal with diskspace issues. The setup I currently have to manage gathers 200 to 300 million docs per 24 hours and a solution was needed to always guarantee enough free diskspace so that Elasticsearch wouldn't fail.

The following bash script is the last incarnation of this quest to have an automatic “ringbuffer”:

#!/bin/bash

  LOCKFILE=/var/run/egc.lock

# Check if Lockfile exists

  if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
    echo "EGC process already running"
    exit 1
  fi

# Make sure the Lockfile is removed 
# when we exit and then claim it

  trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT

# Create Lockfile

  echo $$ > ${LOCKFILE}

# Always keep a minimum of 30GB free in logdata
# by sacrificing oldest index (ringbuffer)

  DF=$(/bin/df /dev/md0 | sed '1d' | awk '{print $4}')

  if [ ${DF} -le 30000000 ]; then
    INDEX=$(/bin/ls -1td /logdata/dntx-es/nodes/0/indices/logstash-* | tail -1 | xargs -n 1 basename)
    curl -XDELETE "http://localhost:9200/${INDEX}"
  fi

# Check & clean elasticsearch logs 
# if disk usage is > 10GB

  DU=$(/usr/bin/du /var/log/elasticsearch/ | awk '{print $1}')

  if [ ${DU} -ge 10000000 ]; then
    rm /var/log/elasticsearch/elasticsearch.log.20*
  fi

# Remove Lockfile

  rm -f ${LOCKFILE}

  exit 0

Make sure to check/modify the script to reflect your particular setup: It's very likely that your paths and device names are different.

It runs every 10 minutes (as a cron job) and checks the available space on the device where Elasticsearch stores its indices. In this example /dev/md0 is mounted on /logdata. If md0 has less than 30GB of free diskspace it automagically finds the oldest Elasticsearch index and drops it via Elasticsearch's REST API without service interruption (no stop/restart of Elasticsearch required).

A simple locking mechanism will prevent multiple running instances in case of timing issues. All you need is curl for it to work and it will increase your storage efficiency so that you can always have as much past data available as your storage allows without the risk of full disk issues or the hassle of manual monitoring & maintaining.

Discussion

Really good post

chrono, 2014/02/20 15:06

The script has been updated to only find logstash-* indexes, otherwise it could tamper with river or marvel/kibana indexes.

Enter your comment. Wiki syntax is allowed:
T B T K L