Tutorial#

This tutorial will guide you throughout writing a simple vortex script. This script will result in the execution of a Python program whose behavior resembles this of a numerical weather prediction forecast program.

To run succesfully, the program must be able to read two files in the current directory:

A file named ICSHMFCSINIT, containing input data required by the program to run. You can think of it as a file containing forecast initial condition data.
A file named fort.4, describing configuration keys and values required by the program. In this example, this file only specifies the value of a configuration key TERM which determines how many result files will be written to disk. You can think of this configuration as the analoguous of a forecast term.

Therefore, prior to running our fake forecast program, our working directory must look like this:

fake-forecast.py
fort.4
ICMSHFCSINIT

Supposing the content of the configuration file fort.4 is

# fort.4
TERM=3

after running the fake forecast program, the working directory will contain three extra files to look like this:

fake-forecast.py
fort.4
ICMSHFCSINIT
ICMSHFCST+01:00.grib
ICMSHFCST+02:00.grib
ICMSHFCST+03:00.grib

Initial set up#

To work through this tutorial, you will need to download the input data archive and extract its content on your computer. The exctraction location doesn’t matter, as long as it is somewhere where you have read and write access.

The archive contains the following files:

fake-forecast.py: A Python script that reads in a data file and a configuration file, both expected to be present in the current directory, and writes a set of files ICMSHFCST+01:00.grib, ICMSHFCST+02:00.grib, ICMSHFCST+03:00.grib. The number of output files is specified by the content of the configuration file.
forecast_configuration_files/main_arpege.nam: A configuration file consisting of one KEY=VALUE pair per line.
data_tree: The root directory for the vortex data tree, from which input files are fetched from and ouput files written to.

Note

The examples in this tutorial assume that the tutorial data archive was extracted in directory /home/user. Be sure to replace this path by the path to the directory where your extracted the tutorial data archive.

This tutorial is a guide to writing a Python script, using the vortex library, to fetch required input data from the data tree directory, run the fetch-forecast.py program, and write output files to the data tree.

Start by creating a empty directory. The location and name do not matter and we’ll just call it vortex-tutorial. Using your favorite text editor, create a new Python file run-forecast.py and write the following lines to it:

# run-forecast.py

import vortex as vtx

vtx.config.set_config(
    section="data-tree",
    key="rootdir",
    # Be sure to replace "/home/user/" by the path where you
    # extracted the tutorial data archive.
    value="/home/user/vortex-tutorial-data/vortex_data_tree",
)

print(
    "The data tree root is",
    vtx.config.from_config("data-tree", "rootdir"),
)

The call to vortex.config.set_config() specifies the location of the vortex data tree, a directory hierarchy that holds data files that vortex can fetch data from and write data to. By default, the data tree is located in the user’s home directory, but for the purpose of this tutorial you will configure the data tree root node to be the directory vortex_data_tree located within the tutorial data files.

Tip

You can run the script with python -i to execute the script in an interactive Python session.

Fetching input data#

Use the vortex.input() function to define an input resource for the initial condition input file:

handlers = vtx.input(
    kind="analysis",
    date="2024082600",
    model="arpege",
    cutoff="production",
    filling="atm",
    geometry="global1798",
    nativefmt="grib",
    vapp="tutorial",
    vconf="fake-forecast",
    experiment="vortex-tutorial",
    block="4dupd2",
    local="ICMSHFCSTINIT",
)

The vortex.input() function returns a list of objects of type Handler. In our case, this list contains only a single item mapping to the initial condition file.

initial_condition = handlers[0]

An instance of Handler is able to compute the file path to the underlying physical file:

initial_condition.locate()

This path is computed from the values of the arguments passed to the vortex.input() function. This path can be computed because the initial condition file is a ressource that was stored by another vortex script. Its location is therefore well defined within a standardised data tree layout, see The vortex data tree.

Note

The vortex.input() function does not actually fetch the corresponding file into the current working directory, it only defines (a) Handler object(s) that provide(s) access to the get method.

To fetch the file into the current working directory, use the get method on the resource handler:

initial_condition.get()

The second step is to fetch the configuration file in the current working directory, as a file named fort.4 since this is what the fake forecast expects.

Similarly to the initial condition file, use the vortex.input() function again:

config_file = vtx.input(
    kind="namelist",
    model="arpege",
    remote="/home/user/vortex-tutorial-data/forecast_configuration_files/main_arpege.nam",
    local="fort.4",
)[0]

Attention

Be sure to replace /home/user by the path to the directory where you extracted the tutorial data.

The call to vortex.input() is much simpler. This time, the path to the configuration file is specified explicitly using the remote argument, instead of being computed by vortex from the arguments of vortex.input().

Running the fake forecast program#

With the input data files copied into the current working directory, you are now ready to run the program. You will first fetch the program itself – in this case a Python script – into the current working directory, then instanciate an algorithmic component object which will be responsible to actually run the script.

Fetching the fake forecast program#

The VORTEX library considers programs, whether they are scripts written in interpreted languages or compiled binaries, as executables. Fetching an executable is similar to fetching an input data file:

exe = vtx.executable(
    kind="script",
    language="python",
    # Replace "/home/user" by the path to the directory you
    # extracted the tutorial data to.
    remote="/home/user/vortex-tutorial-data/fake-forecast.py",
    local="fake-forecast.py",
)[0]

Similarly to vortex.input(), vortex.executable returns a list of instances of the Handler class, which you can call get on:

# Fetch the Python script into the current working directory
exe.get()

Running the script through an algo component#

The vortex library provides a collection of classes that define how to run specific programs. These classes are referred to as algorithmic components.

Algorithmic components classes are instanciated using the vortex.task() function:

task = vtx.task(
    interpreter="python",
    engine="exec",
)

With interpreter="python" and engine="exec", the vtx.algo returns an instance of vortex.algo.components.Expresso. This class encapsulates behavior required the run a Python script, potentially setting up environment variables like PYTHONPATH or switching to a different Python interpreter.

Finally, the script can be run using the run method on the task object, which takes an executable object as a argument.

task.run(exe)

At this point, the script ran and produced 3 files ICMSHFCST+01:00.grib, ICMSHFCST+02:00.grib and ICMSHFCST+03:00.grib in the current working directory. The next step is to store them into the vortex data tree, so that they can be retrieved later by other vortex scripts.

Storing outputs into the data tree#

In this section we use the vortex.output() function to store the files generated by the fake forecast program into the vortex data tree. This way, subsequent vortex scripts will be able to retrieve them using the vortex.input() function.

Storing files in the data tree is achieved by calling the vortex.output(). Its interface is identical to vortex.input()’s:

historic_files = vtx.output(
    kind="modelstate",
    date="2024082600",
    model="arpege",
    cutoff="production",
    geometry="global1798",
    nativefmt="grib",
    vapp="tutorial",
    vconf="fake-forecast",
    experiment="vortex-tutorial",
    term=[1, 2, 3],
    block="forecast",
    local="ICMSHFCST+[term].grib",
)

The vortex.output() function returns a list Handlers instances whose put method works in the opposite direction of get: instead of reading files from the data tree, it writes to it files present in the current working directory that are named as the value passed to the local argument to vortex.output().

Note the addition of the argument term, also referenced within the string passed to local:

 historic_files = vtx.output(
   # ...
   term=[1, 2, 3],
   local="ICMSHFCST+[term].grib",
)

Values of arguments to functions such as vortex.input(), vortex.output() or vortex.executable() can reference the values of other arguments. Sequence are expanded into as many elements as they contain. In this case, vtx.output returns a list of 3 Handler objects instead of a single object.

Finally, calling put on the handlers will write the files into the data tree:

for handler in historic_files:
    handler.put()

You can now list the content of the forecast block to check that the 3 files where indeed written there:

DATATREE_ROOT=<tutorial/data>/vortex_data_tree
ls -l $DATATREE_ROOT/tutorial/fake-forecast/vortex-tutorial/20240826T0000P/forecast

Setting default values#

Definitions of vortex inputs and outputs often feature the same arguments and values. Vortex provides the vortex.defaults() function, which can be used to prevent repeating arguments to functions vortex.input(), vortex.output() or vortex.executable().

Using vortex.defaults(), the script becomes:

import vortex as vtx

vtx.config.set_config(
    section="data-tree", key="rootdir",
    value="/home/user/vortex-tutorial-data/vortex_data_tree",
 )

vtx.defaults(
    date="2024082600",
    model="arpege",
    cutoff="production",
    geometry="global1798",
    nativefmt="grib",
    vapp="tutorial",
    vconf="fake-forecast",
    experiment="vortex-tutorial",
)

initial_condition = vtx.input(
    kind="analysis",
    filling="atm",
    local="ICMSHFCSTINIT",
    block="4dupd2",
)[0]
initial_condition.get()

config_file = vtx.input(
    kind="namelist",
    remote="/home/user/vortex-tutorial-data/forecast_configuration_files/main_arpege.nam",
    local="fort.4",
)[0]
config_file.get()

exe = vtx.executable(
    kind="script",
    language="python",
    remote="/home/user/vortex-tutorial-data/fake-forecast.py",
    local="fake-forecast.py",
)[0]
exe.get()

vtx.task(interpreter="python", engine="exec").run(exe)

for output_handler in vtx.output(
    kind="modelstate",
    nativefmt="grib",
    local="ICMSHFCST+[term].grib",
    block="forecast",
    term=[1, 2, 3],
):
    output_handler.put()

Attention

Be sure to replace "/home/user" by the path to the directory you extracted the tutorial data to.

A post-processing task#

We conclude this tutorial by implementing a second vortex script, which will illustrate the way outputs of one vortex script can be used as inputs of another.

This new vortex script will:

fetch all three forecast output files
concatenate them
write the resulting file back into the data tree

Open a new file aggregate-task.py and start with calling vortex.input():

 import vortex as vtx

 vtx.config.set_config(
    section="data-tree",
    key="rootdir",
    # Be sure to replace "/home/user/" by the path where you
    # extracted the tutorial data archive.
    value="/home/user/vortex-tutorial-data/vortex_data_tree",
)

 vtx.defaults(
     date="2024082600",
     model="arpege",
     cutoff="production",
     vapp="tutorial",
     vconf="fake-forecast",
     experiment="vortex-tutorial",
     geometry="global1798",
 )

 historic_files = vtx.output(
     kind="modelstate",
     nativefmt="grib",
     term=[1, 2, 3],
     local="ICMSHFCST+[term].grib",
     block="forecast",
 )

 for handler in historic_files:
     handler.get()

Note

Because the location of the data tree root is different from the default $HOME/.vortex.d, it is necessary to call vortex.config.set_config() again at the beginning of the script.

For convenience, we could instead use the default location or specify the location of the data tree in the configuration.

Observe that the arguments specified are identical to those provided to the vortex.output() function in section Setting default values.

With the three files present in the working directory, let’s concatenate them into a new file result.txt:

with open("result.txt", "w") as target:
    for handler in historic_files:
        with open(handler.container.localpath(), "r") as source:
            target.writelines(source.readlines())

For the sake of this tutorial, let’s pretend that we want to interpret the output file result.txt as a horizontal diagnostics resource, and write it into the data tree as a such:

# Reminder: vortex.output returns a list, even if
# there is only one element in it.
for output in vtx.output(
    kind="ddh",
    scope="global",
    nativefmt="lfi",
    block="postprocessing",
    term=3,
    local="result.txt",
):
     # Write content of file "result.txt" in current working directory
     # as file ddh.arpege-global.tl1798-c22+0003:00.lfi
     # in the data tree
     output.put()

DATATREE_ROOT=/home/user/vortex-tutorial-data/vortex_data_tree
ls -l $DATATREE_ROOT/tutorial/fake-forecast/vortex-tutorial/20240826T0000P/postprocessing

Note

The use of the DDH resource in the previous example is arbitrary, for the sole purpose of illustrating the storage of the result of a postprocessing operation into the data tree. The content of file ddh.arpege-global.tl1798-c22+0003:00.lfi is not really this of a true horizontal diagnostics file.

Tutorial

Contents

Tutorial#

Initial set up#

Fetching input data#

Running the fake forecast program#

Fetching the fake forecast program#

Running the script through an algo component#

Storing outputs into the data tree#

Setting default values#

A post-processing task#