Working with smaller datasets can be difficult especially if you are trying to use OTV. Do you have to resort to OTV for this particular use case or could you get away with Cross Validation? I think CV with a higher number of folds than default might yield more stable results. With Cross Validation, you can keep the holdout as it is a best practice.
If you have to do OTV, given the limited data, run as many backtests as you can reasonably create given the dataset size and dont worry about keeping the holdout.