Rethinking the Inference Stack. Most AI inference optimisation focuses on individual layers such as model compression or cache tuning. SHIP instead reworks the entire inference li ...