Data Carousel is a use-case of iDDS. The purpose of Data Carousel with iDDS is to release tasks quickly enough, avoid redundant job attempts, and improve error handling.
- Prodsys2 submits tasks with inputPreStaging and toStaging parameters to JEDI.
- Tasks go to staging state in JEDI.
- Prodsys2 creates rules in Rucio to stage-in data from tape to disk by taking into account global share, the number of requests in each FTS channel, and so on, and immediately sends notifications to JEDI before rules are completed.
- For each task, JEDI finds input dataset scope/name and the corresponding rule and sends a request to iDDS.
- iDDS creates a transform object for each request to monitor the rule. Note that in this use-case, tape and disk file replicas are regarded as input and output collections, respectively, and thus data are not really transformed.
- iDDS periodically checks the rule and notifies JEDI via ActiveMQ when files become available on disk (or disk buffer of the tape system).
- JEDI generates jobs using only input files which have disk replicas, assigns the jobs, and submits them to PanDA.
- PanDA creates rules in Rucio to transfer input files if jobs are assigned to satellites where input data are unavailable.
- Jobs get started when input files are or become available.
Without iDDS, JEDI is not aware of which files are on disk while input data are being staged-in, and generates jobs using input files even if some of them have not had disk replicas yet. There are some issues.
- Jobs create other redundant rules in Rucio and requests in FTS to stage-in input files from tape to disk if those files are available only on tape.
- Those jobs tend to silently sit in assigned state and occupy queues until they eventually time-out. They may prevent new jobs from being assigned to the queues, leading to unbalanced job distribution.
A naive solution is to not generate jobs until 90% of the input files become available on disk, i.e., not to release tasks very quickly. iDDS has solved those issues by letting JEDI timely use only the files with disk replicas.
Future improvements to shorten the tails on completing tasks¶
iDDS gives JEDI knowledge of problematic files, which means that iDDS and/or JEDI can take actions if necessary. For example, when files are stuck for long time, iDDS would make new rules to transfer them from other sites. Possible actions and conditions to trigger them to be defined.
iDDS ATLAS data carousel status monitor¶
- Finished: All files are processed.
- Transforming: Some files are still under processing.
in_status: Input dataset status. If the input dataset is not closed, it means it’s still possible that some other system will add more files to the input dataset. So iDDS will monitor whehter there are new files added to this input dataset.
in_total_files: Total number of files in the input dataset.
in_processed_files: Total number of files handled by iDDS(Files that are used as input in an iDDS transformation).
out_status: The status of the output dataset. It will be closed(the transform will be finished) when:
- All input files are processed and the input dataset is closed.
- All output files are processed.
out_total_files: Total number of files in the output dataset.
output_processed_files: Total number of processed files in the output dataset.