To summarize, ZFS's System Attributes (SAs)
are a way for ZFS to pack a somewhat arbitrary collection of
additional information, such as the parent directory of things and symbolic link targets,
into ZFS dnodes in a general and flexible
way that doesn't hard code the specific combinations of attributes
that can be used together. ZFS system attributes are normally stored
in extra space in dnodes that's called the bonus buffer, but the
system attributes can overflow to a spill block if necessary.
I've written more about the high level side of this in my entry
on ZFS SAs, but today I'm going to write up
some concrete details of what you'd see when you look at a ZFS
filesystem with tools like zdb
.
When ZFS stores the SAs for a particular dnode, it simply packs all
of their values together in a blob of data. It knows which part of
the blob is which through an attribute layout, which tells it
which attributes are in the layout and in what order. Attribute
layouts are created and registered as they are needed, which is to
say when some dnode wants to use that particular combination of
attributes. Generally there are only a few combinations of system
attributes that get used, so a typical ZFS filesystem will not have
many SA layouts. System attributes are numbered, but the specific
numbering may differ from filesystem to filesystem. In practice it
probably mostly won't, since most attributes usually get registered
pretty early in the life of a ZFS filesystem and in a predictable
order.
(For example, the creation of a ZFS filesystem necessarily means
creating a directory dnode for its top level, so all of the system
attributes used for directories will immediately get registered,
along with an attribute layout.)
The attribute layout for a given dnode is not fixed when the file
is created; instead, it varies depending on what system attributes
that dnode needs at the moment. The high level ZFS code simply sets
or clears specific system attributes on the dnode, and the low(er)
level system attribute code takes care of either finding or creating
an attribute layout that matches the current set of attributes the
dnode has. Many system attributes are constant over the life of the
dnode, but I think others can come and go, such as the system
attributes used for xattrs.
Every ZFS filesystem with system attributes has three special dnodes
involved in this process, which zdb will report as the "SA master
node", the "SA attr registration" dnode, and the "SA attr layouts"
dnode. As far as I know, the SA master node's current purpose is
to point to the other two dnodes. The SA attribute registry dnode
is where the potentially filesystem specific numbers for attributes
are registered, and the SA attribute layouts dnode is where the
various layouts in use on the filesystem are tracked. The SA master
(d)node itself is pointed to by the "ZFS master node", which is
always object 1.
So let's use zdb to take a look at a typical case:
# zdb -dddd fs19-scratch-01/w/430 1
[...]
Object lvl iblk dblk dsize dnsize lsize %full type
1 1 128K 512 8K 512 512 100.00 ZFS master node
[...]
SA_ATTRS = 32
[...]
# zdb -dddd fs19-scratch-01/w/430 32
Object lvl iblk dblk dsize dnsize lsize %full type
32 1 128K 512 0 512 512 100.00 SA master node
[...]
LAYOUTS = 36
REGISTRY = 35
It's common for the registry and the layout to be consecutive, since
they're generally allocated at the same time. On most filesystems
they will have very low object numbers, since they were created
when the filesystem was.
The registry is generally going to be pretty boring looking:
# zdb -dddd fs19-scratch-01/w/430 35
[...]
Object lvl iblk dblk dsize dnsize lsize %full type
35 1 128K 1.50K 8K 512 1.50K 100.00 SA attr registration
[...]
ZPL_SCANSTAMP = 20030012 : [32:3:18]
ZPL_RDEV = 800000a : [8:0:10]
ZPL_FLAGS = 800000b : [8:0:11]
ZPL_GEN = 8000004 : [8:0:4]
ZPL_MTIME = 10000001 : [16:0:1]
ZPL_CTIME = 10000002 : [16:0:2]
ZPL_XATTR = 8000009 : [8:0:9]
ZPL_UID = 800000c : [8:0:12]
ZPL_ZNODE_ACL = 5803000f : [88:3:15]
ZPL_PROJID = 8000015 : [8:0:21]
ZPL_ATIME = 10000000 : [16:0:0]
ZPL_SIZE = 8000006 : [8:0:6]
ZPL_LINKS = 8000008 : [8:0:8]
ZPL_PARENT = 8000007 : [8:0:7]
ZPL_MODE = 8000005 : [8:0:5]
ZPL_PAD = 2000000e : [32:0:14]
ZPL_DACL_ACES = 40013 : [0:4:19]
ZPL_GID = 800000d : [8:0:13]
ZPL_CRTIME = 10000003 : [16:0:3]
ZPL_DXATTR = 30014 : [0:3:20]
ZPL_DACL_COUNT = 8000010 : [8:0:16]
ZPL_SYMLINK = 30011 : [0:3:17]
The names of these attributes come from the enum of known system
attributes in zfs_sa.h
. The
important bit of the values of them is the '[16:0:1]' portion, which
is a decoded version of the raw number. The format of the raw number
is covered in sa_impl.h
, but
the short version is that the first number is the total length of
the attribute's value, in bytes, the third is its attribute number
within the filesystem, and then middle number is an index of how
to byteswap it if necessary
(and sa.c
has a nice comment about the whole scheme at the top).
(The attributes with a listed size of 0 store their data in extra
special ways that are beyond the scope of this entry.)
The more interesting thing is the SA attribute layouts:
# zdb -dddd fs19-scratch-01/w/430 36
[...]
Object lvl iblk dblk dsize dnsize lsize %full type
36 1 128K 16K 16K 512 32K 100.00 SA attr layouts
[...]
2 = [ 5 6 4 12 13 7 11 0 1 2 3 8 21 16 19 ]
4 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 17 ]
3 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 ]
This particular filesystem has three attribute layouts that have
been used by dnodes, and as you can see they are mostly the same.
Layout 3 is the common subset, with all of the basic inode attributes
you'd expect in a Unix filesystem; layout 2 adds attribute 21
(ZPL_PROJID), and layout 4 adds attribute 17 (ZPL_SYMLINK).
It's possible to have a lot more layouts than this. Here is the
collection of layouts for my home desktop's home directory filesystem
(which uses the same registered attribute numbers as the filesystem
above, so you can look up there for them):
4 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 9 ]
3 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 17 ]
7 = [ 5 6 4 12 13 7 11 0 1 2 3 8 21 16 19 9 ]
2 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 ]
5 = [ 5 6 4 12 13 7 11 0 1 2 3 8 10 16 19 ]
6 = [ 5 6 4 12 13 7 11 0 1 2 3 8 21 16 19 ]
Incidentally, notice how these layout numbers aren't the same as
the layout numbers on the first filesystem; layout 3 on the first
filesystem is layout 2 on my home directory filesystem, layout 4
(symlinks) is layout 3, and layout 2 (project ID) is layout 6. The
additional layouts in my home directory filesystem add xattrs (id 9) or
'rdev' (id 10) to some combination of the other attributes.
One of the interesting aspects of this is that you can use the SA
attribute layouts to tell if a ZFS filesystem definitely doesn't
have some sort of files in it. For example, we know that there are
no device special files or files with xattrs in /w/430, because
there are no SA attribute layouts that include those attributes.
And neither of these two filesystems have ever had ACLs set on any
of their files, because neither of them have layouts with either
SA ACL attributes.
(Attribute layouts are never removed once created, so a filesystem
with a layout with the 'rdev' attribute in it may still not have
any device special files in it right now; they could all have been
removed.)
Unfortunately, I can't see any obvious way to get zdb to tell you
what the current attribute layout is for a specific dnode. At best
you have to try to deduce it from what 'zdb -dddd' will print for
the dnode's attributes.
(I've recently acquired a reason to dig into the details of ZFS
system attributes.)
Sidebar: A brief digression on xattrs in ZFS
As covered in zfsprops(7)'s section on 'xattr=',
there are two storage schemes for xattrs in ZFS (well, in OpenZFS
on Linux and FreeBSD). At the attribute level, 'ZPL_XATTR
' is
the older, more general 'store it in directories and files' approach,
while 'ZPL_DXATTR
' is the 'store it as part of system attributes'
one ('xattr=sa'). When dumping a dnode in zdb, zdb will directly
print SA xattrs, but for directory xattrs it simply reports
'xattr = <object id>', where the object ID is for the xattr directory.
To see the names of the xattrs set on such a file, you need to also
dump the xattr directory object with zdb.
(Internally the SA xattrs are stored as a nvlist,
because ZFS loves nvlists and nvpairs, more or less because Solaris
did at the time.)