# Annotations Annotations perform two functions: - Inferring specific `Arg`-level options, depending on the specific annotation, most of which control parsing behavior for that argument. - Mapping the raw parsed arguments into the specified output types. In any case where the "supported" set of behaviors given by the inference system does not produce the obvious result, feel free to submit a bug report. The intent is that it should feel natural for all supportable cases. For any cases where it does not make sense to build in specific behavior for a type (for example, third-party library types), you can instead use the correct annotation while providing the `parse`, or other [arguments](./arg.md). ```{note} Use of `Annotated` throughout the examples and docs is fairly pervasive. When using versions of python below 3.9, `Annotated` is not included in the `typing` module. Instead, you can install [typing_extensions](https://pypi.org/project/typing_extensions/), which backports typing objects to earlier versions of python. ``` ```{note} Similarly, use of `|` for `Union`s is universally preferred in the docs. When using versions of python below 3.10, you may need to use `from __future__ import annotations` at the top of your file in order for the annotations to be ignored by the python runtime. ``` ## Positional vs Option Below, any reference to "positional" implies, a field which does not set `short=...` or `long=...`. This includes any fields with no explicit `Arg` instance at all. - `foo: str` - `foo: Annotated[str, Arg()]` Whereas an "option" is a field which **does** set `short=...` or `long=...`. - `foo: Annotated[str, Arg(short=True)]` - `foo: Annotated[str, Arg(long=True)]` - `foo: Annotated[str, Arg(short="-f", long=True)]` This is an important distinction to note, because the inference system makes difference decisions, depending on whether a given field is positional or not. ## `Arg` Inference Note that in all cases below, any individual inferred value can be overridden in any case where the default inference is not what you're looking for! ### Bool In most other cases, an un-annotated field like `foo: str` is assumed to be a positional argument. By contrast a `foo: bool` is assumed to be an optional flag. This is because bools corresponding to flags represent the much more common case. By default a `foo: bool` implies `foo: Annotated[bool, Arg(long=True, num_args=0, action=ArgAction.store_true, required=False)]`. - If you also want to enable a "short" version of the flag, you can set `Arg(short=True)`. - If you include a `--no-` long name variant like `Arg(long=['--foo', '--no-foo'])`, the `--no-foo` variant will be inferred as `ArgAction.store_false`. - If a vanilla bool annotation (without an inverted variant) is combined with a default value of `True`, then the action is inverted to `ArgAction.store_false`. This is because `foo: bool = True` could never result in a `False` value, if supplying `--foo` at the command line also stored `True`. See [Arg.short](cappa.Arg.short) and [Arg.long](cappa.Arg.long) for more details on customization of the specific flag names. ```{note} If (for whatever reason) you require a positional argument interpreted as a bool, you can explicitly set `Arg(long=None, action=ArgAction.set, num_args=1)` ``` ### Sequence Types Annotations like `list[...]`, `set[...]`, `tuple[...]`, etc are what we call "sequence types". - In the case of a positional argument, a sequence type annotation implies `Arg(..., num_args=length_bound)`. For "bounded" length sequences (i.e. tuples like `tuple[int]`, `tuple[int, int]`, `tuple[int, int, int]`, etc), `length_bound` corresponds to the indicated length of the sequence. For "unbounded" length sequences (i.e. list, set, and unbounded tuples: `tuple[int, ...]`), `length_bound=-1`, i.e. the argument will consume an unbounded number of positional arguments (`prog foo bar baz ...`). ```python @dataclass class Prog: foo: list[str] prog = cappa.parse(Prog, argv=['foo', 'bar', 'baz']) assert prog == Prog(foo=['foo', 'bar', 'baz']) ``` - In the case of option arguments, a sequence type annotation implies `Arg(..., action=ArgAction.append)`. That is, it allows multiple uses of the option to accumulate the values into a sequence. ```python @dataclass class Prog: foo: Annotated[list[str], Arg(short=True)] prog = cappa.parse(Prog, argv=['-f', 'foo', '-f', 'bar', '-f', 'baz']) assert prog == Prog(foo=['foo', 'bar', 'baz']) ``` ```{note} You can specify `Annotated[list[str], Arg(short=True, num_args=n)]` where `n` would yield a sequence (`-1` or > 1). In such a case, `action` would instead be inferred as `ArgAction.set`. ``` See [Argument](./arg.md) for more details on the difference between `ArgAction.append` and `num_args=-1`. ### `| None` or `Optional[...]` Either form of `Optional`-type annotation implies `Arg(required=False, default=None)`. ### Unions Unions don't currently apply any specific inference behavior, but they do come with some restrictions. - Unioning "scalar" and "sequence" types will raise an `ValueError`. For example, `int | list[str]`. `list[str]` wants to produce a list, whereas `int` wants to produce a single value, and it's unclear how the parser ought to react to such an annotation. - Unioning types which would produce different inferred `num_args` values will raise a `ValueError`. For example, `foo: tuple[str, str] | list[str]`. `int` produces `num_args=2` and `list[str]` will produce `num_args=-1`, which are incompatible. ### Literals and Enums Any form of explicit "choice", like `Literal["one", "two"]`, `Literal["one"] | Literal["two"]`, `Enum` implies `Arg(choices=[...])`. In case of literals, it is obviously the list of all unioned literals. In case of `Enum` subclasses, all variants are given as the set of choices. `choices` represents a parser-level evaluation of the CLI input value versus the available choices, resulting in a parse error in the event the value does not match. ### Subcommands Unions among subcommand options `Subcommands[One | Two | Three]` are how subcommand options are expressed. See [Command](./command.md) docs for more details. The order of the unioned subcommand options does not have any effect because each subcommand has a unambiguous name. ## Mapping Inference Mapping inference is a sort of subset of `Arg`-settings inference, in that it effectively uses the annotated type to set `Arg(parse=...)` in a way that maps the raw values coming out of a successful CLI parse into the annotated types. As such you can again, opt out of the "Mapping inference" entirely, by supplying your own `parse` function. ```{note} Mapping inference is built up out of component functions defined in `cappa.parse`, such as `parse_list`, which know how to translate `list[int]` and a source list of raw parser strings into a list of ints. These functions can/could be utilized in your own custom `parse` functions. ``` ### Basic Scalar Types Any "basic" data type (like `int`, `float`, `str`) are supported and coerced to directly, by calling their constructor. This also applies even to "complex" ones who's constructor accepts a string and returns the type (such as `pathlib.Path`, or `decimal.Decimal`). Note this also applies to `Enum`s, who's raw values by map-time should be guaranteed to be compatible with the Enum's variants. ### `date`/`datetime`/`time` Both types are directly supported through inference, by calling the `fromisoformat` method on each type. ```{note} The set of supported input formats are [python version specific](https://docs.python.org/3/library/datetime.html#datetime.date.fromisoformat). ``` To support more general input formats, you should instead supply a function to `Arg(parse=...)` which accepts a string and returns the given type. ### Union (and Optional) Unions are handled by recursively processing the set of unioned inner types using whatever logic applies to that type. The primary important point here, is that the order of the set of unioned types **can** matter, but generally shouldnt. The unioned types are sorted and processed in descending order from: - "other" types/None - float - int - bool/str This specific order order is given to prioritize types which are less likely to succeed parsing when given incorrect input. `bool` and `str` are notable in that will always "succeed" (in that given any input they will produce an output) and not fail. As such it **may** not make much sense to union those types together without an explicit `parse`. Therefore, when unioning "other"-type types, it **may** be important to consider the order of the unioned types, if parsing for one or the other type would "succeed" incorrectly. In such cases, `parse` may be appropriate. ```{note} It's possible for more complex value mapping to happen automatically, if the input types have distinct "success criteria". This amounts to something akin to a discriminated union. For example, take the annotation `tuple[Literal["foo"], str] | tuple[Literal["bar"], int]`. Supplying `foo bar` as the input value should produce `("foo", "bar")`, whereas `bar 4` should produce `("bar", 4)`. ``` ### List/Tuple/Set `list[...]`, `tuple[...]`, `set[...]` all will coerce the parsed sequences of values into their corresponding type. The inner type will be mapped for each item in the sequence. ### `typing.BinaryIO`/`typing.TextIO` [BinaryIO](typing.BinaryIO) and [TextIO](typing.TextIO) are used to produce an open file handle to the file path given by the CLI input for that argument. This can be thought of as equivalent to `open("foo.py")`, given some `cli --foo foo.py`, which is roughly equivalent to the [FileType](https://docs.python.org/3/library/argparse.html#argparse.FileType) feature from `argparse`. ```python @dataclass class Args: foo: typing.BinaryIO args = cappa.parse(Args) with args.foo: print(args.foo.read()) ``` ```{note} The supported types do not map to concrete, instantiatable types. This is important, because neither of these types would otherwise be valid type annotations in the context of cappa's other inference rules. It's also important, because there are no concrete types which correspond to the underlying types returned by `open()`, which would allow the distinction between binary and text content. ``` #### Controlling `open(...)` options like `mode="w"` An un-`Annotated` IO type translates to `open()` with no additional arguments, with the exception that `BinaryIO` infers `mode='b'`. In order to directly customize arguments like `mode`, `buffering`, `encoding`, and `errors`, a [FileMode](cappa.FileMode) must be annotated on the input argument. ```python import dataclasses import typing import cappa @dataclasses.dataclass class Args: foo: typing.Annotated[typing.BinaryIO, cappa.FileMode(mode='wb', encoding='utf-8')] bar: typing.Annotated[typing.BinaryIO, cappa.Arg(short=True), cappa.FileMode(mode='wb')] ``` As shown, [FileMode](cappa.FileMode) is annotated much like a [Arg](cappa.Arg), and can be used alongside one depending on the details of the argument in question.