As I thought I might, today I spent some
time adding full and relatively honest type hints to my recent
Python program. The
experience didn't go entirely smoothly and it left me with a number
of learning experiences and things I want to note down in case I
ever do this again.
The starting point is that my normal style of coding small programs
is to not make classes to represent different sorts of things and
instead use only basic built in collection types, like lists, tuples,
dictionaries, and so on. When you use basic types this way, it's
very easy to pass or return the wrong 'shape' of thing (I did it
once in the process of writing my program), and I'd like Python
type hints to be able to tell me about this.
(The first note I want to remember is that mypy becomes very irate
at you in obscure ways if you ever accidentally reuse the same
(local) variable name for two different purposes with two different
types. I accidentally reused the name 'data', using it first for a
str and second for a dict that came from an 'Any' typed object, and
the mypy complaints were hard to decode; I believe it complained
that I couldn't index a str with a str on a line where I did
'data["key"]'.)
When you work with data structures created from built in collections,
you can wind up with long, tangled compound type name, like 'tuple[str,
list[tuple[str, int]]]' (which is a real type in my program). These
are annoying to keep typing and easy to make mistakes with, so
Python type hints provide two ways of giving them short names, in
type aliases
and typing.NewType. These
look almost the same:
# type alias:
type hostAlertsA = tuple[str, list[tuple[str, int]]]
# NewType():
hostAlertsT = NewType('hostAlertsT', tuple[str, list[tuple[str, int]]])
The problem with type aliases is that they are aliases. All aliases
for a type are considered to be the same, and mypy won't warn if
you call a function that expects one with a value that was declared
to be another. Suppose you have two sorts of strings, ones that
are a host name and ones that are an alert name, and you would like
to keep them straight. Suppose that you write:
# simple type aliases
type alertName = str
type hostName = str
func manglehost(hname: hostName) -> hostName:
[....]
Because these are only type aliases and because all type aliases
are treated as the same, you have not achieved your goal of keeping
you from confusing host and alert names when you call 'manglehost()'.
In order to do this, you need to use NewType(), at which point mypy
will complain (and also often force you to explicitly mark bare
strings as one or the other, with 'alertName(yourstr)' or
'hostName(yourstr)').
If I want as much protection against this sort of type confusion,
I want to make as many things as possible be NewType()s instead of
type aliases. Unfortunately NewType()s have some drawbacks in mypy
for my sort of usage as far as I can see.
The first drawback is that you cannot create a NewType of 'Any':
error: Argument 2 to NewType(...) must be subclassable (got "Any") [valid-newtype]
In order to use NewType, I must specify concrete details of my
actual (current) implementation, rather than saying just 'this
is a distinct type but anything can be done with it'.
The second drawback is that this distinct typing is actually a
problem when you do certain sorts of transformations of collections.
Let's say we have alerts, which have a name and a start time, and
hosts, which have a hostname and a list of alerts:
alertT = NewType('alertT', tuple[str, int])
hostAlT = NewType('hostAlT', tuple[str, list[alertT]])
We have a function that receives a dictionary where the keys are
hosts and the values are their alerts and turns it into a sorted
list of hosts and their alerts, which is to say a list[hostAlT]).
The following Python code looks sensible on the surface:
def toAlertList(hosts: dict[str, list[alertT]) -> list[hostAlT]:
linear = list(hosts.items())
# Don't worry about the sorting for now
return linear
If you try to check this, mypy will declare:
error: Incompatible return value type (got "list[tuple[str, list[alertT]]]", expected "list[hostAlT]") [return-value]
Initially I thought this was mypy being limited, but in writing
this entry I've realized that mypy is correct. Our .items() returns
a tuple[str, list[alertT]], but while it has the same shape as our
hostAlT, it is not the same thing; that's what it means for hostAlT
to be a distinct type.
However, it is a problem that as far as I know, there is no type
checked way to get mypy to convert the list we have into a
list[hostAlT]. If you create a new NewType to be the list type,
all it 'aListT', and try to convert 'linear
' to it with 'l2 =
aListT(linear)', you will get more or less the same complaint:
error: Argument 1 to "aListT" has incompatible type "list[tuple[str, list[alertT]]]"; expected "list[hostAlT]" [arg-type]
This is a case where as far as I can see I must use a type alias
for 'hostAlT' in order to get the structural equivalence conversion,
or alternately use the wordier and as far as I know less efficient
list comprehension version of list() so that I can tell mypy that
I'm transforming each key/value pair into a hostAlT value:
linear = [hostAlT(x) for x in hosts.items()]
I'd have the same problem in the actual code (instead of in the
type hint checking) if I was using, for example, a namedtuple
to represent a host and its alerts. Calling hosts.items() wouldn't
generate objects of my named tuple type, just unnamed standard
tuples.
Possibly this is a sign that I should go back through my small
programs after I more or less finish them and convert this sort of
casual use of tuples into namedtuple (or the type hinted
version)
and dataclass
types. If nothing else, this would serve as more explicit documentation
for future me about what those tuple fields are. I would have to
give up those clever 'list(hosts.items())' conversion tricks in
favour of the more explicit list comprehension version, but that's
not necessarily a bad thing.
If you have a distinct NewType() and mypy is happy enough with you,
both of these will cause mypy to consider your value to now be of
the new type. However, they have different safety levels and
restrictions. With cast(), there are no type hint checking guardrails
at all; you can cast() an integer literal into an alleged string
and mypy won't make a peep. With, for example, 'hostAlT(...)', mypy
will apply a certain amount of compatibility checking. However,
as we saw above in the 'aListT' example, mypy may still report a
problem on the type change and there are certain type changes you
can't get it to accept.
As far as I know, there's no way to get mypy to temporarily switch
to a structural compatibility checking here. Perhaps there are deep
type safety reasons to disallow that.