On Dec 4th, five months after VERIQA commissioning, the calculation counter showed 1830. Time to present some results.
First, we want to sum up how we currently work with VERIQA.
(A typical ARIA OIS Version 17 Care Path. VERIQA is part of the Dosecheck task.)
(iCheckNet with patient QA results attached to the clinical plans.)
A treatment plan is currently accepted whenever the (3%/3mm)-Gamma test (GAM33) is passed for the BODY. This means that the Gamma Agreement Index (GAI), in this case GAI33, has to be at least 95% for the plan to get a green checkmark.
For stereotactic plans, where tighter tolerances are generally assumed, we skip GAM33 and evaluate GAM22 or GAM21 instead, depending on slice thickness.
Gamma calculation is always suppressed for voxels with dose below 5% of maximum compared dose ("reference dose" = TPS dose, "compared dose" = MC dose).
For fast results with standard plans, the default GAM33 protocol uses 1% dose engine precision. For this precision, we determined the calculation time ("dose engine running") with a stopwatch in a sample of 25 (mostly VMAT) plans. It ranged from 22 to 288 seconds, depending on a number of factors such as voxel size, PTV volume, photon energy, and so on.
The GAM22 and GAM21 protocols use a precision of 0.5%. It takes about 2-3 times longer for SciMoCa™ to calculate to 0.5% instead of 1%.
Shortly after we started to work with VERIQA, we also changed the default calculation algorithm from AAA to AcurosXB (AXB). Now all photon plans are AXB calculated.
Some care should be taken when choosing small Distance to Agreement (DTA) values in the Gamma protocols, because this may lead to unwanted effects. We want to demonstrate this with a simple example.
In this patient, endocrine orbitopathy in Graves' disease was treated with two static, opposing fields (6 MV) at Gantry 90° and 270°. The prescription dose was 8 x 0.3 Gy = 2.4 Gy to target mean (18 MU per field). Regarding dose calculation challenges, this simple example still offers everything: tissue, air, and dense bone:
(Treating endocrine orbitopathy with two opposing fields. PTV in red.)
The Eclipse calculation grid size was 2 mm. Since all our VERIQA templates specify MC dose grid to be identical to TPS dose grid, GAMSciMoCa used 2 mm as well. CT slice thickness1 was 1 mm.
For this plan, we get a GAI22 of 99.95%:
(Plan calculated with 2 mm grid. Left: MC dose, middle: Gamma, right: dose difference. GAI22 = 99.95%)
With a tighter DTA of 1 mm, the GAI21 drops to 98.90%. Visual analysis reveals that this is mainly due to the apparently high Gamma values at the patient surface (red)2:
(Plan calculated with 2 mm grid. Left: MC dose, middle: Gamma, right: dose difference. GAI21 = 98.90%)
When two dose distributions are compared and the DTA is smaller than the calculation grid size, Gamma results can be affected.
To confirm this observation, we recalculate the plan with a grid of 1 mm, and repeat the GAM21 evaluation:
(Plan recalculated with 1 mm grid. Left: MC dose, middle: Gamma, right: dose difference. GAI21 = 99.60%)
For the recalculated plan, the GAI21 increases to 99.60%. At the same time, the pearl-like pattern at the BODY surface disappears.
In another example, a Liver SBRT (platinum markers, Deep Inspiration Breath Hold) was calculated with 2 mm grid (slice thickness: 2 mm). However, the GAI21 surface effect is minor. This is because the relation of voxel size to the size of the evaluated volume is much smaller than in the orbitopathy example above. It is even hard to spot the "Gamma pearls" at the BODY surface (arrows):
(Liver SBRT in Eclipse - 2 mm grid, 2 mm slice thickness; VERIQA evaluations with GAM22 and GAM21.)
What happens if the dose grid in Eclipse is 1 mm, but the slice thickness3 is 2 mm? GAMSciMoCa will use 1 x 1 x 1 mm. For intermediate values like 1.25 mm in Eclipse and 2 mm slice thickness, we found that both Acuros and GAMSciMoCa use a calculation grid of 1.25 x 1.25 x 1 mm.
When VERIQA performs an evaluation, it uses Templates according to certain Conditions of application. The templates invoke Gamma protocols and DVH protocols.
We keep it simple. Only three templates are currently in use. The GAM33 template contains a (3%,3mm) global Gamma test and gets executed if the Eclipse plan name (plan label in DICOM terms) neither starts with "Hyper", "SRT_" or "SBRT_".
In addition to the Gamma protocol, the GAM33 template also contains a simple DVH protocol which evaluates DVHs and compares Target Mean dose for all PTVs in the plan. However, since we are currently not interested in DVH evaluations in the context of patient plan QA, the DVH protocol is inactive by default. We chose to set it active on a case-by-case basis before the plan is sent to VERIQA (and inactive again as soon as the evaluation is done).
If the plan name does start with one of the three strings mentioned above, we check for the slice thickness4 in the GAM22 and GAM21 templates. If it is 2 mm, all conditions of application are met for the GAM22 template which is then executed. If it is 1 mm, GAM21 will run. The reader will get the idea.
This way, the "stereotactic" plans do not evaluate GAM33. However, plan names must follow the naming convention.
The reader may ask: Why do we differentiate according to slice thickness? On one hand, a slice thickness of 1 mm and a calculation grid of 1 mm often go together, e.g. for HyperArc plans in the brain. So it makes sense to evaluate these plans with GAM21 right away. On the other hand, non-integer grid sizes like 1.25 mm are sometimes necessary due to memory limitations of the GPU, if the CT dataset has too many slices. The Gamma protocols however can only use integer values of DTA. SBRT plans are more likely based on 2 mm CT data, therefore we evaluate these plans automatically with GAM22. Of course, additional evaluations with GAM21 are always possible.
Defining such conditions of applications is not mandatory. Reevaluations can be added any time, using any available template. If the user accepts the fact that GAM33 evaluations are always performed even is he is not interested in the results, he can still add additional reevaluations using tighter protocols. Our motivation was rather to play around and try to understand the logic behind the Conditions of Application.
We are aware that VERIQA evaluations can be performed for multiple structures, not only the BODY. VERIQA offers all necessary tools to design complicated templates with acceptance criteria for all kinds of plan metrics. But in the end, all the subevaluations have to be distilled down to one green (or yellow or red) icon. For us, focussing on the BODY is currently the best choice.
During preliminary testing, we had our troubles with helper structures that extended outside the BODY. For these structures, Eclipse would not calculate dose for voxels which are located outside the BODY. We tried to solve the problem by using ROI Density Overrides, which in this case were not used to override density but to remove all helper structures during VERIQA import. However, since in our department there is no naming convention for helper structures, this approach failed.
The problem can elegantly be solved by using the BODY as a ROI for the Gamma evaluations:
(Acceptance criteria for the BODY. The limits for "Total" are relaxed since this criterium cannot be deleted.)
There are some drawbacks though. The BODY in our case is not only the patient volume, but contains all support structures like carbon Head and Neck rests, foam mats, etc. We include these objects in the BODY because Eclipse will dosimetrically ignore everything outside the BODY with the exception of the Couch structure, which is always taken into account. So our BODY is larger than the patient volume, which sometimes can distort the GAI results a little, in both directions: if the support structures are "green" in the Gamma map, GAI will be slightly better than for the true patient volume. But if deviations occur e.g. in a large block of foam which supports a knee, the Gamma test could even fail (at least for AAA calcuated plans) because of a dose discrepancy in a large volume that does not even belong to the patient.
Adding a separate Eclipse structure which represents the true patient volume of course is possible, but we consider this an extra effort which is not worth it. Deviations, if they occur, are visually analysed anyway. One simply has to know how to interpret the BODY.
Stability of the VERIQA platform is excellent. Since installation, there where only three occasions where a server reboot (a simple and quick solution) was performed: Two times, the calculation service was working normally, but DICOM plans sent from VERIQA to RTView for further analysis were not transferred. One time, the calculation job was queued, but did not execute, even after some waiting time.
To reboot the VERIQA server, all we had to do is to navigate to the server's network address in Chrome (bookmark recommended!), log into the IDRAC (Integrated Dell Remote Access Controller), from there into VERIQA's Oracle Linux Server, and type "reboot". That's all. About 2 minutes later, VERIQA was back online.
A daily backup is scheduled at 1:00 a.m. which points to a file share on the network. As usual, we hope we'll never need it.
The VERIQA verification results for the clinical AXB plans since July 4th were loaded in iCheckNet and analysed.
Most of the plans (504) were evaluated with the GAM33 template. The average GAI33 for the 504 plans is 99.90% (minimum: 96.52%).
There are 20 plans in the lowest histogram bin between 96.52% and 99.50%:
The number of plans evaluated with the GAM22 template is much smaller (34 plans). The mean GAI22 is 99.28%. The minimum GAI of 96.48% means that all GAM22 evaluations are well above our warning level of 95%.
(A note on the horizontal axis: the increasing evaluation counter goes back in time, so the most recent evaluations are on the left.)
Finally, the GAM21 results. As already mentioned, a DTA of 1 mm should be used with caution. A total of 21 plans were evaluated. The mean GAI21 is 98.78% with a minimum of 95.92%:
Here again, all evaluations are above the 95% warning level.
So far, we worked with a default template which is rather relaxed. One could argue that a DTA of 3 mm is too large, since the two calculations, TPS and MC, are performed on the same CT dataset. This is probably the reason why PTW recommends a GAM32 test for acceptance testing.
In our case, the high mean GAI33 of 99.90% suggests that we could probably change the default template to GAM22 without problems. This is the current discussion, because we still want to get our green check marks from professor VERIQA:
1 The resolution in the imaging plane in this example is 1.4 mm due to the large Field-of-View chosen (512 * 512 px at 700 mm FOV).
2 This effect would be visible in all body regions (e.g., breast).
3 Regarding slice thickness, we currently use two settings: 2 mm and 1 mm. This will probably change in Feb 2025, when our Toshiba AquilionLB scanner will be replaced by a Siemens SOMATOM go.Open Pro.
4 We found the slice thickness in DICOM tag (0018,0050).