Difference between revisions of "Workflow Program Requirements"

From SCECpedia
Jump to navigationJump to search
Line 6: Line 6:
 
#The code should accept command line parameters to any input files names and not assume that the input file it uses is referred to by a particular name.  
 
#The code should accept command line parameters to any input files names and not assume that the input file it uses is referred to by a particular name.  
 
#The code should accept command line parameters to any output files so we can assign the name to the output file and output directory.
 
#The code should accept command line parameters to any output files so we can assign the name to the output file and output directory.
#If the code requires any environment variables, the program should verify that they are successful read or exit with an error.
+
#The code should read variable inputs either on command line or from a configuration file. A code should have its own config file, or its own section in a config file, that contains parameters related only to itself and no other programs.
 +
#If the code passes parameters on the command line, it is better to use attribute name, attribute value pairs (e.g. –o output_file name –i input_file_name) rather than positional values (e.g. first command line parameter is the input file name, second command line parameters is the output filename).
 +
#If the code requires any environment variables, when the program starts to run, it should verify that they are successfully read or exit with an error.
 +
#When SCEC run’s the program, we will probably submit the program to a job scheduler, using a job scheduler scripting language like PBS. Demonstrating that the software can be run using a PBS interfaces makes is much easier for us to use the program in a workflow.
  
For programs to be hosted as components in workflows, we ask scientists to prepare their code this way.
+
== Broadband Platform Specific Recommendations ==
  
#The code should read variable inputs either on command line or from a configuration file. A code should have its own config file, or its own section in a config file, that contains parameters related only to itself and no other programs.
 
#If the code passes parameters on the command line, it is better to use attribute name, attribute value pairs (e.g. –o output_file name –i input_file_name) rather than positional values (e.g. first command line parameter is the input file name, second command line parameters is the output filename).
 
 
#Programs should input and output seismograms in broadband platform format, which contains 3 components in a single file.
 
#Programs should input and output seismograms in broadband platform format, which contains 3 components in a single file.
 
#Programs should work with an arbitrary seismograms of arbitrary length (remove the power of 2 number of samples restriction)
 
#Programs should work with an arbitrary seismograms of arbitrary length (remove the power of 2 number of samples restriction)
#When SCEC run’s the program, we will probably submit the program to a job scheduler, using a job scheduler scripting language like PBS. Demonstrating that the software can be run using a PBS interfaces makes is much easier for us to use the program in a workflow.
 

Revision as of 13:38, 31 August 2010

We recommend the following programming standards for CME scientific software. These standards help make the codes interoperate. For programs to be hosted as components in workflows, we ask scientists to prepare their code this way.

  1. The code should return an exit code when it runs. It should return only two values: Successful return = 0 - Error exit return = 1
  2. For each code, the number of input and output files must be known and should always be the same whenever the program is run.
  3. There should be no compiled in references to programs, pathnames, or files names in the code. It is okay to set default values for filenames, however, it should be possible for users of the code to overwrite any default file names if necessary using command line parameters.
  4. If the code references any other executables, the program should accept a command line parameter to a directory where the executable can be found.
  5. The code should accept command line parameters to any input files names and not assume that the input file it uses is referred to by a particular name.
  6. The code should accept command line parameters to any output files so we can assign the name to the output file and output directory.
  7. The code should read variable inputs either on command line or from a configuration file. A code should have its own config file, or its own section in a config file, that contains parameters related only to itself and no other programs.
  8. If the code passes parameters on the command line, it is better to use attribute name, attribute value pairs (e.g. –o output_file name –i input_file_name) rather than positional values (e.g. first command line parameter is the input file name, second command line parameters is the output filename).
  9. If the code requires any environment variables, when the program starts to run, it should verify that they are successfully read or exit with an error.
  10. When SCEC run’s the program, we will probably submit the program to a job scheduler, using a job scheduler scripting language like PBS. Demonstrating that the software can be run using a PBS interfaces makes is much easier for us to use the program in a workflow.

Broadband Platform Specific Recommendations

  1. Programs should input and output seismograms in broadband platform format, which contains 3 components in a single file.
  2. Programs should work with an arbitrary seismograms of arbitrary length (remove the power of 2 number of samples restriction)