❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

We've chosen to 'modernize' all of our ZFS filesystems

By: cks
23 April 2025 at 03:17

We are almost all of the way to the end of a multi-month process of upgrading our ZFS fileservers from Ubuntu 22.04 to 24.04 by also moving to more recent hardware. This involved migrating all of our pools and filesystems, involving terabytes of data. Our traditional way of doing this sort of migration (which we used, for example, when going from our OmniOS fileservers to our Linux fileservers was the good old reliable 'zfs send | zfs receive' approach of sending snapshots over. This sort of migration is fast, reliable, and straightforward. However, it has one drawback, which is that it preserves all of the old filesystem's history, including things like the possibility of panics and possibly other things.

We've been running ZFS for long enough that we had some ZFS filesystems that were still at ZFS filesystem version 4. In late 2023, we upgraded them all to ZFS filesystem version 5, and after that we got some infrequent kernel panics. We could never reproduce the kernel panics and they were very infrequent, but 'infrequent' is not the same as 'never' (the previous state of affairs), and it seemed likely that they were in some way related to upgrading our filesystem versions, which in turn was related to us having some number of very old filesystems. So in this migration, we deliberately decided to 'migrate' filesystems the hard way. Which is to say, rather than migrating the filesystems, we migrated the data with user level tools, moving it into pools and filesystems that were created from scratch on our new Ubuntu 24.04 fileservers (which led us to discover that default property values sometimes change in ways that we care about).

(The filesystems reused the same names as their old versions, because that keeps things easier for our people and for us.)

It's possible that this user level rewriting of all data has wound up laying things out in a better way (although all of this is on SSDs), and it's certainly insured that everything has modern metadata associated with it and so on. The 'fragmentation' value of the new pools on the new fileservers is certainly rather lower than the value for most old pools, although what that means is a bit complicated.

There's a bit of me that misses the deep history of our old filesystems, some of which dated back to our first generation Solaris ZFS fileservers. However, on the whole I'm happy that we're now using filesystems that don't have ancient historical relics and peculiarities that may not be well supported by OpenZFS's code any more (and which were only likely to get less tested and more obscure over time).

(Our pools were all (re)created from scratch as part of our migration from OmniOS to Linux, and anyway would have been remade from scratch again in this migration even if we moved the filesystems with 'zfs send'.)

ZFS's delayed compression of written data (when compression is enabled)

By: cks
15 April 2025 at 02:23

In a comment on my entry about how Unix files have at least two sizes, Leah Neukirchen said that 'ZFS compresses asynchronously' and noted that this could cause the reported block size of a just-written file to change over time. This way of describing ZFS's behavior made me twitch and it took me a bit of thinking to realize why. What ZFS does is delayed compression (which is asynchronous with your user level write() calls), but not true 'asynchronous compression' that happens later at an unpredictable time.

Like basically all filesystems, ZFS doesn't immediately start writing data to disk when you do a write() system call. Instead it buffers this data in memory for a while and only writes it later. As part of this, ZFS doesn't immediately decide where on disk the data will be written (this is often called 'delayed allocation' and is common in many filesystems) and otherwise prepare it to be written out. As part of this delayed allocation and preparation, ZFS doesn't immediately compress your written data, and as a result ZFS doesn't know how many disk blocks your data will take up. Instead your data is only compressed and has disk blocks allocated for it as part of ZFS's pipeline of actually performing IO, when the data is flushed to disk, and only then is its physical block size known.

However, once written to disk, the data's compression or lack of it is never changed (nor is anything else about it; ZFS never modifies data once it's written). For example, data isn't initially written in uncompressed form and then asynchronously compressed later. Nor is there anything that goes around asynchronously compressing or decompressing data if you turn on or off compression on a ZFS filesystem (or change the compression algorithm). This periodically irks people who wish they could turn compression on on an existing filesystem, or change the compression algorithm, and have this take effect 'in place' to shrink the amount of space the filesystem is using.

Delaying compressing data until you're writing it out is a sensible decision for a variety of reasons. One of them is that ZFS compresses your data in potentially large chunks, and you may not write() all of that chunk at once. If you wrote half a chunk now and then half a chunk later before it got flushed to disk, it would be a waste of effort to compress your half a chunk now and then throw the away the work when you compressed the whole chunk.

(I also suspect that it was simpler to add compression to ZFS as part of its IO pipeline than to do it separately. ZFS already had a multi-stage IO pipeline, so adding compression and decompression as another step was probably relatively straightforward.)

How ZFS knows and tracks the space usage of datasets

By: cks
19 March 2025 at 02:44

Anyone who's ever had to spend much time with 'zfs list -t all -o space' knows the basics of ZFS space usage accounting, with space used by the datasets, data unique to a particular snapshot (the 'USED' value for a snapshot), data used by snapshots in total, and so on. But today I discovered that I didn't really know how it all worked under the hood, so I went digging in the source code. The answer is that ZFS tracks all of these types of space usage directly as numbers, and updates them as blocks are logically freed.

(Although all of these are accessed from user space as ZFS properties, they're not conventional dataset properties; instead, ZFS materializes the property version any time you ask, from fields in its internal data structures. Some of these fields are different and accessed differently for snapshots and regular datasets, for example what 'zfs list' presents as 'USED'.)

All changes to a ZFS dataset happen in a ZFS transaction (group), which are assigned ever increasing numbers, the 'transaction group number(s)' (txg). This includes allocating blocks, which remember their 'birth txg', and making snapshots, which carry the txg they were made in and necessarily don't contain any blocks that were born after that txg. When ZFS wants to free a block in the live filesystem (either because you deleted the object or because you're writing new data and ZFS is doing its copy on write thing), it looks at the block's birth txg and the txg of the most recent snapshot; if the block is old enough that it has to be in that snapshot, then the block is not actually freed and the space for the block is transferred from 'USED' (by the filesystem) to 'USEDSNAP' (used only in snapshots). ZFS will then further check the block's txg against the txgs of snapshots to see if the block is unique to a particular snapshot, in which case its space will be added to that snapshot's 'USED'.

ZFS goes through a similar process when you delete a snapshot. As it runs around trying to free up the snapshot's space, it may discover that a block it's trying to free is now used only by one other snapshot, based on the relevant txgs. If so, the block's space is added to that snapshot's 'USED'. If the block is freed entirely, ZFS will decrease the 'USEDSNAP' number for the entire dataset. If the block is still used by several snapshots, no usage numbers need to be adjusted.

(Determining if a block is unique in the previous snapshot is fairly easy, since you can look at the birth txgs of the two previous snapshots. Determining if a block is now unique in the next snapshot (or for that matter is still in use in the dataset) is more complex and I don't understand the code involved; presumably it involves somehow looking at what blocks were freed and when. Interested parties can look into the OpenZFS code themselves, where there are some surprises.)

PS: One consequence of this is that there's no way after the fact to find out when space shifted from being used by the filesystem to used by snapshots (for example, when something large gets deleted in the filesystem and is now present only in snapshots). All you can do is capture the various numbers over time and then look at your historical data to see when they changed. The removal of snapshots is captured by ZFS pool history, but as far as I know this doesn't capture how the deletion affected the various space usage numbers.

Using a small ZFS recordsize doesn't save you space (well, almost never)

By: cks
26 September 2024 at 01:54

ZFS filesystems have a famously confusing 'recordsize' property, which in the past I've summarized as the maximum logical block size of a filesystem object. Sometimes I've seen people suggest that if you want to save disk space, you should reduce your 'recordsize' from the default 128 KBytes. This is almost invariably wrong; in fact, setting a low 'recordsize' is more likely to cost you space.

How a low recordsize costs you space is straightforward. In ZFS, every logical block requires its own DVA to point to it and contain its checksum. The more logical blocks you have, the more DVAs you require and the more space they take up. As you decrease the 'recordsize' of a filesystem, files (well, filesystem objects in general) that are larger than your recordsize will use more and more logical blocks for their data and have more and more DVAs, taking up more and more space.

In addition, ZFS compression operates on logical blocks and must save at least one disk block's worth of space to be considered worthwhile. If you have compression turned on (and if you care about space usage, you should), the closer your 'recordsize' gets to the vdev's disk block size, the harder it is for compression to save space. The limit case is when you make 'recordsize' be the same size as the disk block size, at which point ZFS compression can't do anything.

(This is the 'physical disk block size', or more exactly the vdev's 'ashift', which these days should basically always be 4 KBytes or greater, not the disk's 'logical block size', which is usually still 512 bytes.)

The one case where a large recordsize can theoretically cost you disk space is if you have large files that are mostly holes and you don't have any sort of compression turned on (which these days means specifically turning it off). If you have a (Unix) file that has 1 KByte of data every 128 KBytes and is otherwise not written to, without compression and with the default 128 KByte 'recordsize', you'll get a bunch of 128 KByte blocks that have 1 KByte of actual data and 127 KBytes of zeroes. If you reduced your "recordsize', you would still waste some space but more of it would be actual holes, with no space allocated. However, even the most minimal compression (a setting of 'compression=zle') will entirely eliminate this waste.

(The classical case of reducing 'recordsize' is helping databases out. More generally, you reduce 'recordsize' when you're rewriting data in place in small sizes (such as 4 KBytes or 16 KBytes) or appending data to a file in small sizes, because ZFS can only read and write entire logical blocks.)

PS: If you need a small 'recordsize' for performance, you shouldn't worry about the extra space usage, partly because you should also have a reasonable amount of free disk space to improve the performance of ZFS's space allocation.

ZFS properties sometimes change their default values over time

By: cks
12 August 2024 at 02:51

For an assortment of reasons, we don't want ZFS to do compression on most of the filesystems on our fileservers. Some of these reasons are practical technical ones and some of them have to do with our particular local non-technical ('political') decisions around disk space allocation. Traditionally we've done this by the simple mechanism of not specifically enabling compression, because the default was off. Recently I discovered, more or less by coincidence, that OpenZFS had changed the default for ZFS compression from off to on between the version in Ubuntu 22.04 ('v2.1.5' plus Ubuntu changes) and the version in Ubuntu 24.04 ('v2.2.2' plus Ubuntu changes).

(This change was made in early March of 2022 and first appeared in v2.2.0. The change itself is discussed in pull request #13078.)

Another property that changed its default value in OpenZFS v2.2.0 is 'relatime'. This was apparently a change to match general Linux behavior, based on pull request #13614. Since we already specifically turn atime off, we might want to also disable relatime now that it defaults to on, or perhaps it won't have too much of an impact (and in general, atime and relatime may not work over NFS anyway).

These aren't big changes (and they're perfectly sensible ones), but to me they point what should really have already been obvious, which is that OpenZFS can change the default values of properties over time. When you move to the new version of ZFS, you'll probably inherit these new default values, unless you're explicitly setting the properties to something. If you care about various properties having specific values, it's probably worth explicitly setting those values even if they're the current default.

(To be explicit, I think that OpenZFS should make this sort of changes to defaults when they have good reasons, which I feel they definitely did here. Our issues with compression are unusual and specific to our environment, and dealing with it is our problem.)

Some things on how ZFS System Attributes are stored

By: cks
19 June 2024 at 03:23

To summarize, ZFS's System Attributes (SAs) are a way for ZFS to pack a somewhat arbitrary collection of additional information, such as the parent directory of things and symbolic link targets, into ZFS dnodes in a general and flexible way that doesn't hard code the specific combinations of attributes that can be used together. ZFS system attributes are normally stored in extra space in dnodes that's called the bonus buffer, but the system attributes can overflow to a spill block if necessary. I've written more about the high level side of this in my entry on ZFS SAs, but today I'm going to write up some concrete details of what you'd see when you look at a ZFS filesystem with tools like zdb.

When ZFS stores the SAs for a particular dnode, it simply packs all of their values together in a blob of data. It knows which part of the blob is which through an attribute layout, which tells it which attributes are in the layout and in what order. Attribute layouts are created and registered as they are needed, which is to say when some dnode wants to use that particular combination of attributes. Generally there are only a few combinations of system attributes that get used, so a typical ZFS filesystem will not have many SA layouts. System attributes are numbered, but the specific numbering may differ from filesystem to filesystem. In practice it probably mostly won't, since most attributes usually get registered pretty early in the life of a ZFS filesystem and in a predictable order.

(For example, the creation of a ZFS filesystem necessarily means creating a directory dnode for its top level, so all of the system attributes used for directories will immediately get registered, along with an attribute layout.)

The attribute layout for a given dnode is not fixed when the file is created; instead, it varies depending on what system attributes that dnode needs at the moment. The high level ZFS code simply sets or clears specific system attributes on the dnode, and the low(er) level system attribute code takes care of either finding or creating an attribute layout that matches the current set of attributes the dnode has. Many system attributes are constant over the life of the dnode, but I think others can come and go, such as the system attributes used for xattrs.

Every ZFS filesystem with system attributes has three special dnodes involved in this process, which zdb will report as the "SA master node", the "SA attr registration" dnode, and the "SA attr layouts" dnode. As far as I know, the SA master node's current purpose is to point to the other two dnodes. The SA attribute registry dnode is where the potentially filesystem specific numbers for attributes are registered, and the SA attribute layouts dnode is where the various layouts in use on the filesystem are tracked. The SA master (d)node itself is pointed to by the "ZFS master node", which is always object 1.

So let's use zdb to take a look at a typical case:

# zdb -dddd fs19-scratch-01/w/430 1
[...]
   Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
        1    1   128K    512     8K     512    512  100.00  ZFS master node
[...]
               SA_ATTRS = 32 
[...]
# zdb -dddd fs19-scratch-01/w/430 32
   Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       32    1   128K    512      0     512    512  100.00  SA master node
[...]
               LAYOUTS = 36 
               REGISTRY = 35 

It's common for the registry and the layout to be consecutive, since they're generally allocated at the same time. On most filesystems they will have very low object numbers, since they were created when the filesystem was.

The registry is generally going to be pretty boring looking:

# zdb -dddd fs19-scratch-01/w/430 35
[...]
   Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       35    1   128K  1.50K     8K     512  1.50K  100.00  SA attr registration
[...]
       ZPL_SCANSTAMP =  20030012 : [32:3:18]
       ZPL_RDEV =  800000a : [8:0:10]
       ZPL_FLAGS =  800000b : [8:0:11]
       ZPL_GEN =  8000004 : [8:0:4]
       ZPL_MTIME =  10000001 : [16:0:1]
       ZPL_CTIME =  10000002 : [16:0:2]
       ZPL_XATTR =  8000009 : [8:0:9]
       ZPL_UID =  800000c : [8:0:12]
       ZPL_ZNODE_ACL =  5803000f : [88:3:15]
       ZPL_PROJID =  8000015 : [8:0:21]
       ZPL_ATIME =  10000000 : [16:0:0]
       ZPL_SIZE =  8000006 : [8:0:6]
       ZPL_LINKS =  8000008 : [8:0:8]
       ZPL_PARENT =  8000007 : [8:0:7]
       ZPL_MODE =  8000005 : [8:0:5]
       ZPL_PAD =  2000000e : [32:0:14]
       ZPL_DACL_ACES =  40013 : [0:4:19]
       ZPL_GID =  800000d : [8:0:13]
       ZPL_CRTIME =  10000003 : [16:0:3]
       ZPL_DXATTR =  30014 : [0:3:20]
       ZPL_DACL_COUNT =  8000010 : [8:0:16]
       ZPL_SYMLINK =  30011 : [0:3:17]

The names of these attributes come from the enum of known system attributes in zfs_sa.h. The important bit of the values of them is the '[16:0:1]' portion, which is a decoded version of the raw number. The format of the raw number is covered in sa_impl.h, but the short version is that the first number is the total length of the attribute's value, in bytes, the third is its attribute number within the filesystem, and then middle number is an index of how to byteswap it if necessary (and sa.c has a nice comment about the whole scheme at the top).

(The attributes with a listed size of 0 store their data in extra special ways that are beyond the scope of this entry.)

The more interesting thing is the SA attribute layouts:

# zdb -dddd fs19-scratch-01/w/430 36
[...]
   Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       36    1   128K    16K    16K     512    32K  100.00  SA attr layouts
[...]
    2 = [ 5  6  4  12  13  7  11  0  1  2  3  8  21  16  19 ]
    4 = [ 5  6  4  12  13  7  11  0  1  2  3  8  16  19  17 ]
    3 = [ 5  6  4  12  13  7  11  0  1  2  3  8  16  19 ]

This particular filesystem has three attribute layouts that have been used by dnodes, and as you can see they are mostly the same. Layout 3 is the common subset, with all of the basic inode attributes you'd expect in a Unix filesystem; layout 2 adds attribute 21 (ZPL_PROJID), and layout 4 adds attribute 17 (ZPL_SYMLINK).

It's possible to have a lot more layouts than this. Here is the collection of layouts for my home desktop's home directory filesystem (which uses the same registered attribute numbers as the filesystem above, so you can look up there for them):

    4 = [ 5  6  4  12  13  7  11  0  1  2  3  8  16  19  9 ]
    3 = [ 5  6  4  12  13  7  11  0  1  2  3  8  16  19  17 ]
    7 = [ 5  6  4  12  13  7  11  0  1  2  3  8  21  16  19  9 ]
    2 = [ 5  6  4  12  13  7  11  0  1  2  3  8  16  19 ]
    5 = [ 5  6  4  12  13  7  11  0  1  2  3  8  10  16  19 ]
    6 = [ 5  6  4  12  13  7  11  0  1  2  3  8  21  16  19 ]

Incidentally, notice how these layout numbers aren't the same as the layout numbers on the first filesystem; layout 3 on the first filesystem is layout 2 on my home directory filesystem, layout 4 (symlinks) is layout 3, and layout 2 (project ID) is layout 6. The additional layouts in my home directory filesystem add xattrs (id 9) or 'rdev' (id 10) to some combination of the other attributes.

One of the interesting aspects of this is that you can use the SA attribute layouts to tell if a ZFS filesystem definitely doesn't have some sort of files in it. For example, we know that there are no device special files or files with xattrs in /w/430, because there are no SA attribute layouts that include those attributes. And neither of these two filesystems have ever had ACLs set on any of their files, because neither of them have layouts with either SA ACL attributes.

(Attribute layouts are never removed once created, so a filesystem with a layout with the 'rdev' attribute in it may still not have any device special files in it right now; they could all have been removed.)

Unfortunately, I can't see any obvious way to get zdb to tell you what the current attribute layout is for a specific dnode. At best you have to try to deduce it from what 'zdb -dddd' will print for the dnode's attributes.

(I've recently acquired a reason to dig into the details of ZFS system attributes.)

Sidebar: A brief digression on xattrs in ZFS

As covered in zfsprops(7)'s section on 'xattr=', there are two storage schemes for xattrs in ZFS (well, in OpenZFS on Linux and FreeBSD). At the attribute level, 'ZPL_XATTR' is the older, more general 'store it in directories and files' approach, while 'ZPL_DXATTR' is the 'store it as part of system attributes' one ('xattr=sa'). When dumping a dnode in zdb, zdb will directly print SA xattrs, but for directory xattrs it simply reports 'xattr = <object id>', where the object ID is for the xattr directory. To see the names of the xattrs set on such a file, you need to also dump the xattr directory object with zdb.

(Internally the SA xattrs are stored as a nvlist, because ZFS loves nvlists and nvpairs, more or less because Solaris did at the time.)

ZFS's transactional guarantees from a user perspective

By: cks
29 May 2024 at 02:58

I said recently on the Fediverse that ZFS's transactional guarantees were rather complicated both with and without fsync(). I've written about these before in terms of transaction groups and the ZFS Intent Log (ZIL), but that obscured the user visible behavior under the technical details. So here's an attempt at describing just the visible behavior, hopefully in a way that people can follow despite how it gets complicated.

ZFS has two levels of transactional behavior. The basic layer is what happens when you don't use fsync() (or the filesystem is ignoring it). At this level, all changes to a ZFS filesystem are strongly ordered by the time they happened. ZFS may lose some activity at the end, but if you did operation A before operation B and there is a crash, the possible options of what is there afterward is nothing, A, or A and B; you can never have B without A. This strictly time ordered view of filesystem changes is periodically flushed to disk by ZFS; in modern ZFS, such a flush is typically started every five seconds (although completing a flush can take some time). This is generally called a transaction group (txg) commit.

The second layer of transactional behavior comes in if you fsync() something. When you fsync() something (and fsync is enabled on the filesystem, which is the default), all uncommitted metadata changes are immediately flushed to disk along with whatever uncommitted file data changes you requested a fsync() for (if you fsync'd a file instead of a directory). If several processes request fsync()s at once, all of their requests will be merged together, so a single immediate flush may include data for multiple files. Uncommitted file changes that no one requested a fsync() for will not be immediately flushed and will instead wait for the next regular non-fsync() flush (the next txg commit).

(This is relatively normal behavior for fsync(), except that on most filesystems a fsync() doesn't immediately flush all metadata changes. Metadata changes include things like creating, renaming, or removing files.)

A fsync() can break the strict time order of ZFS changes that exists in the basic layer. If you write data to A, write data to B, fsync() B but not A, and ZFS crashes immediately, the data for B will still be there but the change to A may have been lost. In some situations this can result in zero length files even though they were intended to have data. However, if enough time goes by everything from before the fsync() will have been flushed out as part of the non-fsync() flush process.

As a technical detail, ZFS makes it so that all of the changes that are part of a particular periodic flush are tied to each other (if there have been no fsyncs to meddle with the ordering); either all of them will appear after a crash or none of them will. This can be used to create atomic groups of changes that will always appear together (or be lost together), by making sure that all changes are part of the same periodic flush (in ZFS jargon, they are part of the same transaction group (txg)). However, ZFS doesn't give programs any explicit way to do this, and this atomic grouping can be messed up if someone fsync()s at an inconvenient time.

The flow of activity in the ZFS Intent Log (as I understand it)

By: cks
20 February 2024 at 02:58

The ZFS Intent Log (ZIL) is a confusing thing once you get into the details, and for reasons beyond the scope of this entry I recently needed to sort out the details of some aspects of how it works. So here is what I know about how things flow into the ZIL, both in memory and then on to disk.

(As always, there is no single 'ZFS Intent Log' in a ZFS pool. Each dataset (a filesystem or a zvol) has its own logically separate ZIL. We talk about 'the ZIL' as a convenience.)

When you perform activities that modify a ZFS dataset, each activity creates its own ZIL log record (a transaction in ZIL jargon, sometimes called an 'itx', probably short for 'intent transaction') that is put into that dataset's in-memory ZIL log. This includes both straightforward data writes and metadata activity like creating or renaming files. You can see a big list of all of the possible transaction types in zil.h as all of the TX_* definitions (which have brief useful comments). In-memory ZIL transactions aren't necessarily immediately flushed to disk, especially for things like simply doing a write() to a file. The reason that plain write()s to a file are (still) given ZIL transactions is that you may call fsync() on the file later. If you don't call fsync() and the regular ZFS transaction group commits with your write()s, those ZIL transactions will be quietly cleaned out of the in-memory ZIL log (along with all of the other now unneeded ZIL transactions).

(All of this assumes that your dataset doesn't have 'sync=disabled' set, which turns off the in-memory ZIL as one of its effects.)

When you perform an action such as fsync() or sync() that requests that in-memory ZFS state be made durable on disk, ZFS gathers up some or all of those in-memory ZIL transactions and writes them to disk in one go, as a sequence of log (write) blocks ('lwb' or 'lwbs' in ZFS source code), which pack together those ZIL transaction records. This is called a ZIL commit. Depending on various factors, the flushed out data you write() may or may not be included in the log (write) blocks committed to the (dataset's) ZIL. Sometimes your file data will be written directly into its future permanent location in the pool's free space (which is safe) and the ZIL commit will have only a pointer to this location (its DVA).

(For a discussion of this, see the comments about the WR_* constants in zil.h. Also, while in memory, ZFS transactions are classified as either 'synchronous' or 'asynchronous'. Sync transactions are always part of a ZIL commit, but async transactions are only included as necessary. See zil_impl.h and also my entry discussing this.)

It's possible for several processes (or threads) to all call sync() or fsync() at once (well, before the first one finishes committing the ZIL). In this case, their requests can all be merged together into one ZIL commit that covers all of them. This means that fsync() and sync() calls don't necessarily match up one to one with ZIL commits. I believe it's also possible for a fsync() or sync() to not result in a ZIL commit if all of the relevant data has already been written out as part of a regular ZFS transaction group (or a previous request).

Because of all of this, there are various different ZIL related metrics that you may be interested in, sometimes with picky but important differences between them. For example, there is a difference between 'the number of bytes written to the ZIL' and 'the number of bytes written as part of ZIL commits', since the latter would include data written directly to its final space in the main pool. You might care about the latter when you're investigating the overall IO impact of ZIL commits but the former if you're looking at sizing a separate log device (a 'slog' in ZFS terminology).

❌
❌