****************************************** Setting up Data and Creating New Projects ****************************************** This section discusses the environment variables used to access data and process files and executables, how to setup data using the environment variables, the various location the environment variables can point to on development servers, how to create new process repositories to store code, and provides an overview of the key development stages in the `ADI Development Steps`_ section presented at the end.. ===================================== Environment Variables ===================================== ADI shared libraries, VAPs, and ingests use environment variables to determine the location of data, configuration files, and binaries. By using environment variables to define locations of these items their location can be easily manipulated to a different location by resetting the environment variables without having to change any source code. ---------------------------- Data Environment Variables ---------------------------- - DATA_HOME - The directory at which the data subdirectories required for ARM process is located - Base directory for datastream, configuration, logs, and quicklook data which each also have environment variables - Expected last directory in path = 'data' - Subdirectories: - conf - datastream - logs - quicklook(s) or www/process - Examples: - prod location: /data/ - test location: /data/home/dev/vap//DATA/data - user location: /data/home//data - DATASTREAM_DATA - Location for netCDF data. For VAPs this includes input and output netcdf data - This must be equal to $DATA_HOME/datastream - Expected last directory in path = 'datastream' - Subdirectories: //. where - Examples: - prod location: /data/datastream - test location: /data/home/dev/vap//DATA/data/datastream - user location: /data/home//data/datastream - LOGS_DATA: - Location of logs generated during run. - This must be equal to $DATA_HOME/logs. - Expected last directory in path = 'logs' - Subdirectories: //proc_logs/ !Note these subdirectories are always created by ADI libraries!! - Examples: - prod location: /data/logs/ - test location: /data/home/dev/vap/twrmr/DATA/data/logs - user location: /data/home//data/logs - COLLECTION_DATA: - Location of input data of raw files (applies to ingests only) - This must be equal to $DATA_HOME/collection - Expected last directory in path = 'collection' - CONF_DATA - This must be equal to $DATA_HOME/conf - Location for configrations files that change more than once a year. Within this directory files can be organized by site in - $CONF_DATA// or by vap in - $CONF_DATA// There are two additional datastream environment variables that can be used to isolate the input data sources from the output data sources. This is useful if you want to read the data in from /data/archive but write it out to another area. - DATASTREAM_DATA_IN - Expected last directory in path = 'datastream' Same as DATASTREAM_DATA but only used to find the input datastream directories. - DATASTREAM_DATA_OUT - Expected last directory in path = 'datastream' Same as DATASTREAM_DATA but only used to find the output datastream directories. ------------------------------------ VAP Executable Environment Variable ------------------------------------ The following Environment variables are only required if the VAP makes use of configuration files. - VAP_HOME - Base directory for VAP binaries and executables. - Expected last directory in path = There is no expected last directory name. - Subdirectories: - bin - bytecode - conf - include - Examples: - prod location: /apps/process - user location: /home//apps/process --------------------------------------- VAP Configuration Environment Variable --------------------------------------- - VAP_HOME/vap/conf - Location of configuration files that do not change over time, or change at most once a year. These files are maintained in the VAP's GitLab repository, and released to the $VAP_HOME/vap/conf as part of the build process. Methods of updating VAP configuration files in CONF_DATA -------------------------------------------------------------- Because the files in $CONF_DATA are not released, an alternative method of installing them on the production processing system is needed. There are two possible methods of updating files in CONF_DATA (1) create a stand alone task in ServiceNow to have the system administrators copy them into the desirect location (2) Use doorstep to install the configuration files. Details for both methods are described below. - Updating files by requesting they be copied to production. This method is recommended when the file will be updated infrequently (a few times a year) or that only need to be transferred to production once when the VAP (or a new site for the VAP) is setup because subsequent updates will be done automatically by the VAP process. Request to transer files to production should be made via a ServiceNow. Preferably in an ENG or EWO associated with the VAP, or if those are not available in a stand alone incident. Describe where the files should be installed in $DATA_CONF and the location that the files to transfer to production can be found and assign to ADC system administrators. - Updating files using doorstep: !!This method can currenlty ONLY be used to install files to $CONF_DATA//!!. As such it only supports installation of conf files that require a seperate file for each site and facility. To use this method (a) Notify the individual who will be providing the new or updated files to deliver them via ftp.arm.gov as 'anonymous' using their email as password. They should place the files in the directory corresponding to the site and facility to which the conf files apply. (i.e. /pub/sites//_conffiles) (b) Submit a task in ServiceNow to have the doorstep.conf file updated. Preferably the task should be a child of an ENG or EWO associated with the VAP, or if those are not available in a stand alone incident. Assign this task to Brian Ermold. Note the process name, sites and facilities that will have files, and who should receive notification that files have been updated. ===================================== ARM Data Locations ===================================== On production the location of the DATA_HOME is always /data However, where to set DATA_HOME when running on the development server is a function of why the process is being executed. The locatin differs based on whether the process is being run - on production in an event-driven mode - for production in ARMFlow in manual mode - for production in ARMFlow in reprocess mode - to execute a formal process test - to run an evaluation VAP whose output will be shipped to the archive - to do large scale testing to validate the logic and algorithm of a process. For the first 3 cases related to production processing the user executes the processes via ArmFlow and it sets up all the environment variables. This section will discuss the test data area, locations to process evaluation data, and where to setup data large scale testing/validating. --------------- Test Data Area --------------- This is a defined area for formal tests associated with VAPs and ingests. This is the location that the dsproc_test application expects to find the input data to test cases it executes and where it will write the output data it creates. The location is a function of the type of process (vap or ingest) and includes a subdirectory 'DATA' prior to the end point directory 'data'. - Path and example for VAP - DATA_HOME = /data/home/dev/vap//DATA/data - DATA_HOME = /data/home/dev/vap/twrmr/DATA/data - Path and example for ingest - DATA_HOME = /data/home/dev/ingest//DATA/data - DATA_HOME = /data/home/dev/ingest/sirs/DATA/data - This directory and its child directories must all be setup by the developer responsible for the VAP. - Input files for each input data source must be copied to this area (!DO not use symbolic links to /data/archive!! as this could cause a test case to fail unexpectedly should the files in /data/archive be reprocessed) - The permissions on all files created should be rw-rw-r (i.e. 775) so that any developer can run the test and overwrite existing data. --------------------- Evaluation Data Area --------------------- Currently evaluation processes cannot be run through ArmFlow. This area is set aside as an area in which evaluation data can be created that is intended to be shipped to the archive via a ServiceNow Release Data to Archive workflow. The location is a function of the user who will be running the process and process (i.e. vap repository not pcm process) being run. - Path and example for vap - DATA_HOME = /data/vap/// - DATA_HOME = /data/vap/gaustad/mfraod - Typically a 'data' directory is not included and the DATA_HOME is noted as above. A 'data' can be added if a user chooses as long as the child environment variables are all defined with respect to DATA_HOME (i.e. DATASTREAM_DATA = DATA_HOME/datastream). - This directory and its child directories must all be setup by the developer responsible for the VAP. - Input directories for each input data source should typically use symbolic links to /data/archive (!!Do not copy files if they are unchanged from /data/archive area). - The permissions on all files created should not be rw-rw-r (i.e. 775) so that any developer can cannot overwrite existing data. --------------------- User Data Area --------------------- An additional location a developer can point to is their own /data/home/ area. This area can be used for testing otuside of formal test data cases but not intended to produce data that will be shipped to the archive. It is more of a scratch area. As such it DATA_HOME is not required to be in any particular directory in /data/home/. This location can be used for digging deeper into certain periods where problems are found during larger scale processing in the /data/vap area without having to overwrite the files in /data/vap. - Paths are not managed beyond requiring they be in /data/home/, they can be set there or by process_name or any other way a developer chooses. Sample locations include - DATA_HOME = /data/home//data - DATA_HOME = /data/home///data - DATA_HOME = /data/home//DATA/data - This directory and its child directories must all be setup by the developer responsible for the VAP. - Input directories for each input data source should typically use symbolic links to /data/archive but can also link to another developer's area or contain actual files if the number is small. ===================================== Creating New Repositories in GitLab ===================================== Need section on creating empty repository ------------------- create_adi_project ------------------- create_adi_project is a source code generation tool that uses the PCM database entries to create a C, IDL, or Python software project for processes defined in the PCM. The scripts for the project are created from a script generator, can compile and run with no additional code producing netCDF files with all variables that can be derived from the database entries made via the PCM. The source code produced has hooks into which the user can insert their own code, thus, jump starting the development of their ARM Value Added Products (VAPs). After the VAP process has been fully defined in the PCM and saved to the DSDB, the create_adi_project application can be run to create a C, or Python project comprised (use of IDL is discouraged) of a - main module, - hooks for the ADI Data Processing Modules (shown in green at '_), - supporting files documenting retrieved, transformed, and output variables, and - files needed to build the VAP. There are templates to create ingest and VAP projects. create_adi_project Command Line Arguments ------------------------------------------ The required input parameters for the create_adi_project include the specification of the process for which templates are being produced, the template type, and the directory into which the templates will be created. Optional input parameters are provided to document the source code with the developers contact information, to produce a dump of the DSDB elements associated with the process into a json data file, and to run from such a json dump rather than accessing process information from the DSDB. A complete summary of the create_adi_project command line options is shown in the following table along with an example. +-------------------------------------------------------------------------+ | create_adi_project Usage | +==================+===============+========+=============================+ | **Input** | **Argument** | **Req**| **Argument Description** | | **Arguments** | **Value** | | | +----+-------------+---------------+--------+-----------------------------+ | -h | --help | | N/A | | +----+-------------+---------------+--------+-----------------------------+ | -p | --process | | Yes | Name process defined in PCM | +----+-------------+---------------+--------+-----------------------------+ | -t | --template |