Annotations

Annotations perform two functions:

  • Inferring specific Arg-level options, depending on the specific annotation, most of which control parsing behavior for that argument.

  • Mapping the raw parsed arguments into the specified output types.

In any case where the “supported” set of behaviors given by the inference system does not produce the obvious result, feel free to submit a bug report. The intent is that it should feel natural for all supportable cases. For any cases where it does not make sense to build in specific behavior for a type (for example, third-party library types), you can instead use the correct annotation while providing the parse, or other arguments.

Note

Use of Annotated throughout the examples and docs is fairly pervasive. When using versions of python below 3.9, Annotated is not included in the typing module.

Instead, you can install typing_extensions, which backports typing objects to earlier versions of python.

Note

Similarly, use of | for Unions is universally preferred in the docs. When using versions of python below 3.10, you may need to use from __future__ import annotations at the top of your file in order for the annotations to be ignored by the python runtime.

Positional vs Option

Below, any reference to “positional” implies, a field which does not set short=... or long=.... This includes any fields with no explicit Arg instance at all.

  • foo: str

  • foo: Annotated[str, Arg()]

Whereas an “option” is a field which does set short=... or long=....

  • foo: Annotated[str, Arg(short=True)]

  • foo: Annotated[str, Arg(long=True)]

  • foo: Annotated[str, Arg(short="-f", long=True)]

This is an important distinction to note, because the inference system makes difference decisions, depending on whether a given field is positional or not.

Arg Inference

Note that in all cases below, any individual inferred value can be overridden in any case where the default inference is not what you’re looking for!

Bool

In most other cases, an un-annotated field like foo: str is assumed to be a positional argument.

By contrast a foo: bool is assumed to be an optional flag. This is because bools corresponding to flags represent the much more common case.

By default a foo: bool implies foo: Annotated[bool, Arg(long=True, num_args=0, action=ArgAction.store_true, required=False)].

  • If you also want to enable a “short” version of the flag, you can set Arg(short=True).

  • If you include a --no-<name> long name variant like Arg(long=['--foo', '--no-foo']), the --no-foo variant will be inferred as ArgAction.store_false.

  • If a vanilla bool annotation (without an inverted variant) is combined with a default value of True, then the action is inverted to ArgAction.store_false.

    This is because foo: bool = True could never result in a False value, if supplying --foo at the command line also stored True.

See Arg.short and Arg.long for more details on customization of the specific flag names.

Note

If (for whatever reason) you require a positional argument interpreted as a bool, you can explicitly set Arg(long=None, action=ArgAction.set, num_args=1)

Sequence Types

Annotations like list[...], set[...], tuple[...], etc are what we call “sequence types”.

  • In the case of a positional argument, a sequence type annotation implies Arg(..., num_args=length_bound).

    For “bounded” length sequences (i.e. tuples like tuple[int], tuple[int, int], tuple[int, int, int], etc), length_bound corresponds to the indicated length of the sequence.

    For “unbounded” length sequences (i.e. list, set, and unbounded tuples: tuple[int, ...]), length_bound=-1, i.e. the argument will consume an unbounded number of positional arguments (prog foo bar baz ...).

    @dataclass
    class Prog:
        foo: list[str]
    
    prog = cappa.parse(Prog, argv=['foo', 'bar', 'baz'])
    assert prog == Prog(foo=['foo', 'bar', 'baz'])
    
  • In the case of option arguments, a sequence type annotation implies Arg(..., action=ArgAction.append). That is, it allows multiple uses of the option to accumulate the values into a sequence.

    @dataclass
    class Prog:
        foo: Annotated[list[str], Arg(short=True)]
    
    prog = cappa.parse(Prog, argv=['-f', 'foo', '-f', 'bar', '-f', 'baz'])
    assert prog == Prog(foo=['foo', 'bar', 'baz'])
    

    Note

    You can specify Annotated[list[str], Arg(short=True, num_args=n)] where n would yield a sequence (-1 or > 1). In such a case, action would instead be inferred as ArgAction.set.

See Argument for more details on the difference between ArgAction.append and num_args=-1.

| None or Optional[...]

Either form of Optional-type annotation implies Arg(required=False, default=None).

Unions

Unions don’t currently apply any specific inference behavior, but they do come with some restrictions.

  • Unioning “scalar” and “sequence” types will raise an ValueError.

    For example, int | list[str]. list[str] wants to produce a list, whereas int wants to produce a single value, and it’s unclear how the parser ought to react to such an annotation.

  • Unioning types which would produce different inferred num_args values will raise a ValueError.

    For example, foo: tuple[str, str] | list[str]. int produces num_args=2 and list[str] will produce num_args=-1, which are incompatible.

Literals and Enums

Any form of explicit “choice”, like Literal["one", "two"], Literal["one"] | Literal["two"], Enum implies Arg(choices=[...]).

In case of literals, it is obviously the list of all unioned literals. In case of Enum subclasses, all variants are given as the set of choices.

choices represents a parser-level evaluation of the CLI input value versus the available choices, resulting in a parse error in the event the value does not match.

Subcommands

Unions among subcommand options Subcommands[One | Two | Three] are how subcommand options are expressed. See Command docs for more details.

The order of the unioned subcommand options does not have any effect because each subcommand has a unambiguous name.

Mapping Inference

Mapping inference is a sort of subset of Arg-settings inference, in that it effectively uses the annotated type to set Arg(parse=...) in a way that maps the raw values coming out of a sucessful CLI parse into the annotated types.

As such you can again, opt out of the “Mapping inference” entirely, by supplying your own parse function.

Note

Mapping inference is built up out of component functions defined in cappa.annotation, such as parse_list, which know how to translate list[int] and a source list of raw parser strings into a list of ints.

These functions can/could be utilized in your own custom parse functions.

Basic Scalar Types

Any “basic” data type (like int, float, str) are supported and coerced to directly, by calling their constructor.

This also applies even to “complex” ones who’s constructor accepts a string and returns the type (such as pathlib.Path, or decimal.Decimal).

Note this also applies to Enums, who’s raw values by map-time should be guaranteed to be compatible with the Enum’s variants.

Union (and Optional)

Unions are handled by recursively processing the set of unioned inner types using whatever logic applies to that type. The primary important point here, is that the order of the set of unioned types can matter, but generally shouldnt.

The unioned types are sorted and processed in descending order from:

  • “other” types/None

  • float

  • int

  • bool/str

This specific order order is given to prioritize types which are less likely to succeed parsing when given incorrect input.

bool and str are notable in that will always “succeed” (in that given any input they will produce an output) and not fail. As such it may not make much sense to union those types together without an explicit parse.

Therefore, when unioning “other”-type types, it may be important to consider the order of the unioned types, if parsing for one or the other type would “succeed” incorrectly. In such cases, parse may be appropriate.

Note

It’s possible for more complex value mapping to happen automatically, if the input types have distinct “success criteria”. This amounts to something akin to a discriminated union.

For example, take the annotation tuple[Literal["foo"], str] | tuple[Literal["bar"], int]. Supplying foo bar as the input value should produce ("foo", "bar"), whereas bar 4 should produce ("bar", 4).

List/Tuple/Set

list[...], tuple[...], set[...] all will coerce the parsed sequences of values into their corresponding type. The inner type will be mapped for each item in the sequence.

typing.BinaryIO/typing.TextIO

BinaryIO and TextIO are used to produce an open file handle to the file path given by the CLI input for that argument.

This can be thought of as equivlent to open("foo.py"), given some cli --foo foo.py, which is roughly equivalent to the FileType feature from argparse.

@dataclass
class Args:
    foo: typing.BinaryIO


args = cappa.parse(Args)
with args.foo:
    print(args.foo.read())

Note

The supported types do not map to concrete, instantiatable types. This is important, because neither of these types would otherwise be valid type annotations in the context of cappa’s other inference rules.

It’s also important, because there are no concrete types which correspond to the underlying types returned by open(), which would allow the distinction between binary and text content.

Controlling open(...) options like mode="w"

An un-Annotated IO type translates to open(<cli value>) with no additional arguments, with the exception that BinaryIO infers mode='b'.

In order to directly customize arguments like mode, buffering, encoding, and errors, a FileMode must be annotated on the input argument.

import dataclasses
import typing
import cappa

@dataclasses.dataclass
class Args:
    foo: typing.Annotated[typing.BinaryIO, cappa.FileMode(mode='wb', encoding='utf-8')]
    bar: typing.Annotated[typing.BinaryIO, cappa.Arg(short=True), cappa.FileMode(mode='wb')]

As shown, FileMode is annotated much like a Arg, and can be used alongside one depending on the details of the argument in question.