We found during testing that with out prediction explanations the response time is 20 milliseconds. With pred exp its 5 seconds. The latter being too long for every transaction in real time, as it's not good for the customer experience. We are looking to split the deployments and score at intervals (eg. 12/hrs)
Has anyone had a similar experience and presumably this is the normal expectation because of the extra compute required?
This is proper behavior for permutation-based prediction explanations. If latency and prediction explanations are key components then try to create another project with SHAP explanations - this way latency should remain the same magnitude as without any explanations.