Pre Oracle su hugapages kanonom a co je dobre pre Oracle, to je dobre aj pre Postgres :)
1. Co su hugepages
Kernel poskytuje pametove priestory (virt memory) pre appky, ale robi to v blokoch – pages, typicky 4kB, alebo 2MB. (Je samozrejme vyhodnejsie robit management pamati v nejakych zvazkoch, nez po jednotlivych bitoch :)). Kazdopadne to prinasa nejaku reziu – kernel si musi pamatat, v ktorych miestach RAMky si udrzuje ake data, t.j. musi si vytvoril mapovaciu tabulku na fyzicku pamat. Pokial je pamat velka a pamatove naroky appky tiez (typicky ORADB), tak je tych pages mnoho a mapovacia tabulka je velka, coz prinasa vyznamne znizenie vykonu.
Priklad: Pokial je RAMka 8 GB a ma pages nasekane na 4kB, tak je to 2 mil. pametovych stranok, ktore kernel musi prehladavat, ked hlada data pre nejaky proces. Okrem toho si kazdy Memory Page Entry pre svoj zapis vezme 8 bajtov….
Idealnym riesenim je zvacsit velkost stranok a tym znizit ich pocet. Zvacsenim velkosti dostaneme hugepages, a sme doma :). Na Solarise sa to vola Large Pages a na BSD Super Pages.
Nastavenie hugepages v kerneli
grep -i hugepages /proc/meminfo – pouzivam hegepages a ked ano, tak ake a kolko? Vystup:
AnonHugePages: 2048 kB = “Non-file backed huge pages mapped into userspace page tables” ?????
HugePages_Total: 5510 = celkovy pocet stranok, ktore si pri starte rezervoval kernel (nastavuje sa v vm.nr_hugepages
).
HugePages_Free: 264 = volne stranky nealokovane ziadnym procesom
HugePages_Rsvd: 5 = “reserved”. Stranky, ktorey su slubene nejakemu procesu/procesom.
HugePages_Surp: 0 = “surplus”, prebyvajuce, volne ???
Hugepagesize: 2048 kB = nastavena velkost jednej stranky (tu 2 mega)
Oracle ma taky skriptik, ktory vypocita na zaklade velkosti pamate a strankovania, ake a kolko hugepages nastavit. Tu je ten skript:
#!/bin/bash
#
# hugepages_settings.sh
#
# Linux bash script to compute values for the
# recommended HugePages/HugeTLB configuration
#
# Note: This script does calculation for all shared memory
# segments available when the script is run, no matter it
# is an Oracle RDBMS shared memory segment or not.
# Check for the kernel version
KERN=`uname -r | awk -F. ‘{ printf(“%d.%d\n”,$1,$2); }’`
# Find out the HugePage size
HPG_SZ=`grep Hugepagesize /proc/meminfo | awk {‘print $2’}`
# Start from 1 pages to be on the safe side and guarantee 1 free HugePage
NUM_PG=1
# Cumulative number of pages required to handle the running shared memory segments
for SEG_BYTES in `ipcs -m | awk {‘print $5’} | grep “[0-9][0-9]*”`
do
MIN_PG=`echo “$SEG_BYTES/($HPG_SZ*1024)” | bc -q`
if [ $MIN_PG -gt 0 ]; then
NUM_PG=`echo “$NUM_PG+$MIN_PG+1” | bc -q`
fi
done
# Finish with results
case $KERN in
‘2.4’) HUGETLB_POOL=`echo “$NUM_PG*$HPG_SZ/1024” | bc -q`;
echo “Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL” ;;
‘2.6’ | ‘3.8’ | ‘3.10’ | ‘4.1’ ) echo “Recommended setting: vm.nr_hugepages = $NUM_PG” ;;
*) echo “Unrecognized kernel version $KERN. Exiting.” ;;
esac
# End
Samotne povolenie hugepages v kerneli sa nastavuje v /proc/sysctl.conf, kde sa prida napr:
vm.nr_hugepages=62
a sysctl sa otoci sysctl -p
Kazdopadne je nutne ceknut security limits pre pamat. Su povolene hugepages a keby ich pocet a kapacita kolidovali s memory limits, tak kernel zpanikuje (alebo aplikacia nenastartuje).
Tu pekny awk skript od
Franck Pachota, ktory ukazuje aktualne vyuzitie hugepages:
awk '/Hugepagesize:/{p=$2} / 0 /{next} / kB$/{v[sprintf("%9d GB %-s",int($2/1024/1024),$0)]=$2;next}
{h[$0]=$2} /HugePages_Total/{hpt=$2} /HugePages_Free/{hpf=$2}
{h["HugePages Used (Total-Free)"]=hpt-hpf} END{for(k in v)
print sprintf("%-60s %10d",k,v[k]/p); for (k in h) print sprintf("%9d GB %-s",p*h[k]/1024/1024,k)}'
/proc/meminfo|sort -nr|grep --color=auto -iE "^|( HugePage)[^:]*" #awk #meminfo
Transparent hugepages
Co su
transparent hugpages. Poriadne tomu nerozumiem, ale vypada to tak, ze sa nejedna o skutocne hugepages, ale ich “fejkovanie”. Pages su stale 4kB, ale MMU v procesore sa snazi najst/vytvorit suvisly zapis 4kB blokov pamate za sebou a tym ich spojit do “akoze hugepage”. Niekedy to funguje, ale obcas ti urobi poriadny
pruser az do memory leaku.
Je lepsie na transparent hugepages sa vykaslat a nekomplikovat si zivot. Maju 3 mody fungovania (always/madwise/never) a nastavuje sa to:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
Niekedy transparent hugepages asi maju zmysel, ale v pripade DB servrov je VZDY DOPORUCOVANE ich vypnut (“For databases generally, fixed sized HugePages are needed, which Transparent HugePages do not provide”).
Hugepages a Postgres
Vyuzitie hugepages pod Postgresom znamena vyrazne zvysenie vykonu –
vid tento benchmark. Vsimni si, ze to plati len v tom pripade, pokial sa postgres vojde do shared buffers (mnozstvo RAMky “zamknutej” a vyhradenej pre Postgres po starte instancie). Takze pokial sa nastavi spravne shared_buffers a pouziju hugapages, tak je mozne ocakavat rychlu DB.
Postgres zohladnuje hugepages vo svojom hlavnom konfe:
huge_pages = on # on, off, or try
Pokial je to nastavene na “on” a hugepages nie su v systeme povolene, tak DB ani nenastartuje. Pokial je to “try”, Postgres po starte vyskusa, ci su zavedene a pokial nie, tak prejde do rezimu prace s “normalnymi” memory pages 4kB. Off je off.
Hugepages je mozne nastavovat v tomto konfe (a reloadnut posstgres), alebo ja cez poztgresie API:
psql> alter system set huge_pages=on
No dobre, a na ake hodnoty nastavit hugepages pre Postgres? Jasny navod na to
poskytuje doku Postgresu (hugepages uplne na konci). Princip je taky, ze sa pusti Postgres, jukne sa na pamatovy peak PIDu Postgresu a vydeli nakonfigurovanou velkostou hugepages (= pocet hugepages, ktory sa posle do sysctl):
sysctl -w vm.nr_hugepages=XXXXX
echo -n “vm.nr_hugepages = ” > /etc/sysctl.d/01-huge_pages.conf
cat /proc/$(ps -fu postgres | grep /usr/lib/postgres | awk ‘{print $2}’)/status | grep VmPeak | awk ‘{printf(“%d\n”,$2/2048 + 0.9)}’ >> /etc/sysctl.d/01-huge_pages.conf
sysctl -p /etc/sysctl.d/01-huge_pages.conf
Dobo
2 Apr 20 at 8:44
postgresql hugepages definitive guide
https://wiki.postgresql.org/images/7/7d/PostgreSQL_and_Huge_pages_-_PGConf.2019.pdf
dobo
15 Mar 24 at 14:28