Ancestors

Toot

Written by Somnius :server_tiger: on 2024-09-10 at 23:37

[#]mastoadmin hey folks—I noticed a strange issue that's been happening since today where when Elasticsearch is running, the scheduler process gets stuck on an Index process and slowly eats all my server RAM until the server bricks itself. Has anyone seen this and do you know how to debug this issue?

I'm on Hometown 1.1.1 / Mastodon 4.2.10, and I'm using Elasticsearch v7.

=> More informations about this toot | More toots from somnius@merveilles.town

Descendants

Written by Somnius :server_tiger: on 2024-09-10 at 23:38

I want to see what the index scheduler is doing but the logs aren't really telling me anything, just that the job starts, and the job runs for forever in Sidekiq and never finishes

=> More informations about this toot | More toots from somnius@merveilles.town

Written by djm on 2024-09-11 at 02:03

@somnius

Netdata has saved me more than once in this sort of pickle

=> More informations about this toot | More toots from djm@merveilles.town

Written by Somnius :server_tiger: on 2024-09-11 at 06:54

@djm This looks great, but I'm having trouble installing it! I have been meaning to get some kind of logging solution going that goes beyond journalctl—but there's so much documentation it's hard to sift through how this can work heh. I'll have to keep this in mind for a rainy day!

=> More informations about this toot | More toots from somnius@merveilles.town

Written by Grant on 2024-09-11 at 15:08

@somnius Have the JVM options for max heap size changed during the upgrade? I think by default the JVM wants to take all the memory on the host...

An example of specifying 512MB max heap:

ES_JAVA_OPTS=-Xms512m -Xmx512m

Looking through the GC (garbage collection) logs to see what Java is trying to do might help as well.

I also found this CLI utility that might be handy:

https://github.com/objectrocket/elasticstat

=> More informations about this toot | More toots from tuxinator@merveilles.town

Written by Somnius :server_tiger: on 2024-09-11 at 15:54

@tuxinator So at first, they did. However the problem persists after both setting the max heap size (to 1 gigabyte) and rebooting—specifically the issue is that the sidekiq process running the scheduler runs an IndexScheduler process that balloons to eat all available RAM when left alone. I'm pretty mystified as to why this is happening, honestly!

=> More informations about this toot | More toots from somnius@merveilles.town

Written by Grant on 2024-09-11 at 16:53

@somnius Ah okay, so the Lucene indexer uses memory mapped files outside of the JVM.

Have you tried setting the "indices.memory.index_buffer_size" option to set a hard limit on the memory usage for the indexer?

=> More informations about this toot | More toots from tuxinator@merveilles.town

Written by Somnius :server_tiger: on 2024-09-11 at 19:20

@tuxinator Ooh I was unaware of this config, this is the only lead I have for now. I can see if this works!

=> More informations about this toot | More toots from somnius@merveilles.town

Written by Grant on 2024-09-11 at 17:02

@somnius I'm also curious about the usage of the memory when it reaches 100%. Is it all active or more cache?

=> More informations about this toot | More toots from tuxinator@merveilles.town

Written by Somnius :server_tiger: on 2024-09-11 at 19:20

@tuxinator Hm, so when I look at the memory usage, it's not the elasticsearch process that consumes all the memory, it's the Sidekiq process that does this. Elasticsearch does take up about the size of the JVM.

I'm not sure how to answer this question though, not sure about the difference between "active" and "cache". Assuming you mean "active" as "using actual RAM" and not accessing the disk, it is indeed taking up all the actual RAM on my server when this happens!

=> More informations about this toot | More toots from somnius@merveilles.town

Written by Grant on 2024-09-11 at 19:35

@somnius The "free -h" command will tell you the breakdown of memory usage.

Here is a funny but actually useful site that explains it more:

https://www.linuxatemyram.com/

Another thing you can check is if there are OOM events in the system log (journalctl --no-pager | grep -i "out of memory"). That would reveal if the kernel thinks memory is actually running out and it's killing processes.

=> More informations about this toot | More toots from tuxinator@merveilles.town

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113115886418089529
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 316.964444 milliseconds
Gemini-to-HTML Time: 2.406767 milliseconds

This content has been proxied by September (ba2dc).