pt-galera-log-explainer¶
Filter, aggregate and summarize multiple galera logs together. This is a toolbox to help navigating Galera logs.
Usage¶
pt-galera-log-explainer [--since=] [--until=] [-vv] [--merge-by-directory] [--pxc-operator] <command> <paths ...>
Commands available¶
list¶
pt-galera-log-explainer [flags] list { --all | [--states] [--views] [--events] [--sst] [--applicative] } <paths ...>
List key events in chronological order from any number of nodes (sst, view changes, general errors, maintenance operations) It will aggregates logs together by identifying them using node names, IPs and internal Galera identifiers.
It can be from a single node:
pt-galera-log-explainer list --all --since 2023-01-05T03:24:26.000000Z /var/log/mysql/*.log
or from multiple nodes.
pt-galera-log-explainer list --all *.log
You can filter by type of events
pt-galera-log-explainer list --sst --views *.log
whois¶
Find out information about nodes, using any type of information
pt-galera-log-explainer [flags] whois [--json] [--type { nodename | ip | uuid | auto }] <information to search> <paths ...>
pt-galera-log-explainer whois '218469b2' mysql.log
pt-galera-log-explainer whois '172.17.0.3' mysql.log
pt-galera-log-explainer whois 'galera-node2' mysql.log
conflicts¶
List every replication failure votes (Galera 4)
pt-galera-log-explainer conflicts [--json|--yaml] *.log
ctx¶
Get the tool crafted context for a single log. It will contain everything the tool extracted from the log file: version, sst information, known uuid-ip-nodename mappings, …
pt-galera-log-explainer ctx mysql.log
regex-list¶
Will print every implemented regexes: * regex: the regex that will be used against the log files * internalRegex: the golang regex that will be used to extract piece of information * type: the regex group it belong to * verbosity: the required level of verbosity to which it will be printed
pt-galera-log-explainer regex-list
Available flags¶
-h
,--help
Show help and exit.
--no-color
Remove every color special characters
--since
Only list events after this date. It will affect the regex applied to the logs. Format: 2023-01-23T03:53:40Z (RFC3339)
--until
Only list events before this date. This is only implemented in the tool loop, it does not alter regexes. Format: 2023-01-23T03:53:40Z (RFC3339)
--merge-by-directory
Instead of relying on extracted information, logs will be merged by their base directory It is useful when logs are very sparse and already organized by nodes.
--skip-merge
Disable the ability to merge log files together. Can be used when every nodes have the same
wsrep_node_name
-v
,--verbosity
-v
: display in the timeline every mysql info the tool used-vv
: internal tool debug--pxc-operator
Analyze logs from Percona PXC operator. Operator logs should be automatically detected (see
--skip-operator-detection
). It will prevent logs from being merged together, add operator specific regexes, and fine-tune regexes for logs taken frompt-k8s-debug-collector
. Off by default because it negatively impacts performance for non-k8s setups.--skip-operator-detection
Disable automatic detection of PXC operator logs. When detected, a message will be shown. Detection is done using a prefix regex.
--exclude-regexes
Remove regexes from analysis. Use
pt-galera-log-explainer regex-list | jq .
to have the list--grep-cmd
grep v3 binary command path. For Darwin systems, it could need to be set to
ggrep
Default:grep
--version
Show version and exit.
--custom-regexes
Add custom regexes, printed in magenta. Format: (golang regex string)=[optional static message to display]. If the static message is left empty, the captured string will be printed instead. Custom regexes are separated using semi-colon. Example:
--custom-regexes="Page cleaner took [0-9]*ms to flush [0-9]* pages=;doesn't recommend.*pxc_strict_mode=unsafe query used"
Example outputs¶
$ pt-galera-log-explainer list --all --no-color --since=2023-03-12T19:41:28.493046Z --until=2023-03-12T19:44:59.855491Z tests/logs/upgrade/*
identifier 172.17.0.2 node2 tests/logs/upgrade/node3.log
current path tests/logs/upgrade/node1.log tests/logs/upgrade/node2.log tests/logs/upgrade/node3.log
last known ip 172.17.0.2
last known name node2
mysql version 8.0.28
2023-03-12T19:41:28.493046Z starting(8.0.28) | |
2023-03-12T19:41:28.500789Z started(cluster) | |
2023-03-12T19:43:17.630191Z | node3 joined |
2023-03-12T19:43:17.630208Z node3 joined | |
2023-03-12T19:43:17.630221Z node2 joined | |
2023-03-12T19:43:17.630243Z | node1 joined |
2023-03-12T19:43:17.634138Z | | node2 joined
2023-03-12T19:43:17.634229Z | | node1 joined
2023-03-12T19:43:17.643210Z | PRIMARY(n=3) |
2023-03-12T19:43:17.648163Z | | PRIMARY(n=3)
2023-03-12T19:43:18.130088Z CLOSED -> OPEN | |
2023-03-12T19:43:18.130230Z PRIMARY(n=3) | |
2023-03-12T19:43:18.130916Z OPEN -> PRIMARY | |
2023-03-12T19:43:18.904410Z will receive IST(seqno:178226792) | |
2023-03-12T19:43:18.913328Z | | node1 cannot find donor
2023-03-12T19:43:18.913429Z node1 cannot find donor | |
2023-03-12T19:43:18.913565Z | node1 cannot find donor |
2023-03-12T19:43:19.914122Z | | node1 cannot find donor
2023-03-12T19:43:19.914259Z node1 cannot find donor | |
2023-03-12T19:43:19.914362Z | node1 cannot find donor |
2023-03-12T19:43:20.914957Z | | (repeated x97)node1 cannot find donor
2023-03-12T19:43:20.915143Z (repeated x97)node1 cannot find donor | |
2023-03-12T19:43:20.915262Z | (repeated x97)node1 cannot find donor |
2023-03-12T19:44:58.999603Z | | node1 cannot find donor
2023-03-12T19:44:58.999791Z node1 cannot find donor | |
2023-03-12T19:44:58.999891Z | node1 cannot find donor |
2023-03-12T19:44:59.817822Z timeout from donor in gtid/keyring stage | |
2023-03-12T19:44:59.839692Z SST error | |
2023-03-12T19:44:59.840669Z | | node2 joined
2023-03-12T19:44:59.840745Z | | node1 left
2023-03-12T19:44:59.840933Z | node3 joined |
2023-03-12T19:44:59.841034Z | node1 left |
2023-03-12T19:44:59.841189Z NON-PRIMARY(n=1) | |
2023-03-12T19:44:59.841292Z PRIMARY -> OPEN | |
2023-03-12T19:44:59.841352Z OPEN -> CLOSED | |
2023-03-12T19:44:59.841515Z terminated | |
2023-03-12T19:44:59.841529Z former SST cancelled | |
2023-03-12T19:44:59.848349Z | | node1 left
2023-03-12T19:44:59.848409Z | | PRIMARY(n=2)
2023-03-12T19:44:59.855443Z | node1 left |
2023-03-12T19:44:59.855491Z | PRIMARY(n=2) |
$ pt-galera-log-explainer whois 172.17.0.2 --no-color tests/logs/upgrade/*
ip:
└── 172.17.0.2
├── nodename:
│ └── node1 (2023-03-12 19:35:07.644683 +0000 UTC)
│
└── uuid:
├── 1d3ea8f5 (2023-03-12 07:24:13.789261 +0000 UTC)
├── 54ab931e (2023-03-12 07:43:08.563339 +0000 UTC)
├── fecde235 (2023-03-12 08:46:48.963504 +0000 UTC)
├── a07872e1 (2023-03-12 08:49:41.206124 +0000 UTC)
├── 60da0bf9-aa9c (2023-03-12 12:29:48.873397 +0000 UTC)
├── 35b62086-902c (2023-03-12 13:04:23.979636 +0000 UTC)
├── ca2c2a5f-a82a (2023-03-12 19:35:05.878879 +0000 UTC)
└── eefb9c8a-b69a (2023-03-12 19:43:17.133756 +0000 UTC)
Requirements¶
grep, version 3 On Darwin based OS, grep is only version 2 due to license limitations. –grep-cmd can be used to point the correct grep binary, usually ggrep
Compatibility¶
Percona XtraDB Cluster: 5.5 to 8.0
MariaDB Galera Cluster: 10.0 to 10.6
logs from PXC operator pods (error.log, recovery.log, post.processing.log)
Known issues¶
Nodes sharing the same ip, or nodes with identical names are not supported
Sparse files identification can be missed, resulting in many columns displayed.
--merge-by-directory
can be used, but files need to be organized already in separate directories This is mainly when the log file does not contain enough information.Some information will seems missed. Depending on the case, it may be simply unimplemented yet, or it was disabled later because it was found to be unreliable (node index numbers are not reliable for example)
Columns width are sometimes too large to be easily readable. This usually happens when printing SST events with long node names
When some display corner-cases seems broken (events not deduplicated, …), it is because of extra hidden internal events.