Fetching and writing data files#
vortex facilitates the transfer of data files between an arbitrary location and the current working directory. This location can either be a leaf of the data tree or a specific path.
See also
This chapter describes how to fetch and write data files programmatically, using vortex as a library. This can also be done directly from the command line using the vtx command.
Fetching files from the data tree#
This is achieved by creating a resource handler using the input
function, then calling the get method on this handler.
The vortex.input function can take a large number of keyword
arguments, the list of which can be broken down into three categories:
Arguments specifying the type of ressource, that is which subclass of
vortex.data.resources.Resourcewill eventually be instantiated as part of the handler’s creation.Arguments specifying the the location of the source data file. For example whether or not the resource is fetched from the local data tree, a remote data tree – or from a specific path.
A single argument
localspecifying the name of the resulting file in the current working directory.
A call to vortex.input does not trigger any file transfer (<TODO>or
link?). It only returns a vortex.data.Handler object that
aggregates information about the underlying resource, source location
and target file. Fetching the file is only achieved by calling the
get method on a Handler object.
The following example fetches a initial conditon file from experiment
xpid in the data tree into a file ICMSHFCSTINIT in the current
working directory:
import vortex as vtx
handler = vtx.input(
kind="analysis",
date="2024082600",
model="arpege",
cutoff="production",
geometry="franmgsp",
filling="atm",
block="4dupd2",
experiment="xpid",
local="ICMSHFCSTINIT",
)
# Actually trigger data transfer
handler.get()
Resource description expansion#
A call to vortex.input can refer to multiple source data files. For
instance, consider the case of fetching hourly forecast output files:
import vortex as vtx
handler = vtx.input(
kind="gridpoint",
date="2024082600",
model="arpege",
origin="historic",
cutoff="production",
geometry="franmgsp",
term=[1, 2, 3, 4, 5, 6],
block="forecast",
experiment="xpid",
local="ICMSHFCSTINIT",
)
Setting argument term to a list of 6 items means vortex.input will
eventually return list of 6 vortex.data.Handler objects. The
above call is in fact equivalent to:
handlers = [
vtx.input(
kind="gridpoint",
date="2024082600",
model="arpege",
origin="historic",
cutoff="production",
geometry="franmgsp",
term=term,
block="forecast",
experiment="xpid",
local="ICMSHFCSTINIT",
)
for term in range(1,7)
]
Expansion works across arguments. The following call
handlers = vtx.input(
kind="gridpoint",
date="2024082600",
model="arpege",
origin="historic",
cutoff="production",
geometry="franmgsp",
term=[1, 2, 3, 4, 5, 6],
block="forecast",
experiment=["xpid1", "xpid2", "xpid3"],
local="ICMSHFCSTINIT",
)
if syntactic sugar for
handlers = [
vtx.input(
kind="gridpoint",
date="2024082600",
model="arpege",
origin="historic",
cutoff="production",
geometry="franmgsp",
term=term,
block="forecast",
experiment=xp,
local="ICMSHFCSTINIT",
)
for term in range(1,7)
for xp in ["xpid1", "xpid2", "xpid3"]
]
It also possible to refer to the value passed for a given argument
within the value of another. This is achieved using square brackets
[] whenever the value is a string. For instance:
handlers = vtx.input(
kind="historic",
# ...
term = [0, 1, 2, 3],
local="ICMSHFCST+[term::fmthm]",
)
for h in handlers:
h.get()
The above results in three files ICMSHFCST+0001:00,
ICMSHFCST+0002:00 and ICMSHFCST+0003:00 in the current working
directory.
The double :: syntax is used to execute a method call on the
resulting object. In the above example, [term] would refer to the
term attribute of the resource object, which is an instance of
Time. Specifying term::fmthm evaluates method fmthm on the
Time object, resulting in the string 0001:00, 0002:00 or
0003:00, depending on the value of term.
Writing data files to the data tree#
Transfering files to the data tree is the mirror operation to
fetching from it. It works very similarly, this time getting a
Handler object from the output function and calling put on the
handler.
The following example writes initial condition file ICMSHFCSTINIT in
the current working directory into the experiment xpid in the data
tree.
import vortex as vtx
handler = vtx.output(
kind="analysis",
date="2024082600",
model="arpege",
cutoff="production",
geometry="franmgsp",
filling="atm",
block="4dupd2",
experiment="xpid",
local="ICMSHFCSTINIT",
)
# Actually trigger data transfer
handler.put()
Ressource resolution#
The list of arguments passed to vortex.input or vortex.output is
arbitrary. However, a resource handler will only be successfully
instancitated if the argumentts specifying the ressource match the
attributes of an existing vortex.data.Ressource subclass.
As an example, let’s assume the following call to vortex.input.
It is nearly identical to the vortex.input displayed in the
previous section, except for a
missing geometry argument.
import vortex as vtx
handler = vtx.input(
kind="analysis",
date="2024082600",
model="arpege",
cutoff="production",
filling="atm",
block="4dupd2",
experiment="xpid",
local="ICMSHFCSTINIT",
)
# No resource found in description
Report Footprint-Resource:
vortex.nwp.data.modelstates.Analysis3D
geometry : {'why': 'Missing value'}
vortex.nwp.data.modelstates.Analysis4D
geometry : {'why': 'Missing value'}
term : {'why': 'Missing value'}
The call to vortex.input fails because no Resource subclass was
found to match the ressource attributes specified as arguments to
input. The error message provides a list of canditate classes
together with a description of why the class was not selected.
Particularly, it indicates that the class
common.data.modelstates.Analysis3D was not selected because of a
missing geometry argument to input.
Many resource attributes have a set of prescribed values. Specifying
arguments to vortex.input or vortex.output with values not in this
set will also cause the call to fail. The below example specifies the
cutoff argument as "foo", which is not part of the two prescribed
values "assim" and "production":
import vortex as vtx
handler = vtx.input(
kind="analysis",
date="2024082600",
model="foo",
cutoff="production",
filling="atm",
block="4dupd2",
experiment="xpid",
local="ICMSHFCSTINIT",
)
# No resource found in description
Report Footprint-Resource:
vortex.nwp.data.modelstates.Analysis3D
model : {'why': 'Not in values', 'args': 'foo'}
vortex.nwp.data.modelstates.Analysis4D
term : {'why': 'Missing value'}
It can be difficult to know which arguments to provide vortex.input
or vortex.output to accurately match the attributes of the
Resource subclass that represent the actual resource you are
targting.
If you know the Resource subclass name, you can look up the
attributes and their prescribed values in the reference documentation
<todolink>. If you don’t, a good strategy is to build the call to
vortex.input / vortex.output interactively, using the output of
the resource resolution as a guide. You can use the resource
attributes reference as a starting point <todolink>.
Setting default arguments#
It is common for several calls to functions vortex.input or
vortex.output to share a large part of their argument
specifications. To avoid having to repeat the same arguments between
each call, vortex provides the defaults function.
In the following example, both calls to input and output share the
arguments specified by the call to vortex.defaults:
import vortex as vtx
vtx.defaults(
kind="analysis",
date="2024082600",
cutoff="production",
model="arpege",
experiment="xpid",
)
input_handler = vtx.input(
kind="analysis",
filling="atm",
block="4dupd2",
local="ICMSHFCSTINIT",
)
# ...
# ...
output_handlers = vtx.output(
kind="historic",
block="forecast",
term = [0, 1, 2, 3],
local="ICMSHFCST+[term::fmthm]",
)