Real-time cognitive assistance is one of the most exciting applications in the age of Augmented Reality (AR). Several research groups have explored the use of cognitive assistants, embodied within smartphones or wearable AR glasses, to guide users through unfamiliar tasks (e.g., assembling a piece of furniture or following a recipe). These systems generally consist of two high-level modules: a perceptual module (e.g., a deep-learning based vision system) and a cognitive module (implemented via a rule-engine or state machine), and must operate in near real-time. As such, cognitive assistants are illustrative use-cases for edge computing. While prior work has focused on pushing the frontier of what is possible, it suffers from some defects that hinder practical deployment. First, much research on cognitive assistants has assumed an accurate visual perception system, which may not be true in practice. Second, while some work has explored user errors in performance of tasks, the manner in which this is done is not scalable (i.e., possible errors are explicitly specified in a state machine representation apriori). To address these limitations, in this paper, we propose (i) to involve users in resolving the ambiguity/uncertainty of visual inputs and (ii) to employ automated planning tools as well as execution monitoring techniques to keep track of the task states, as well as to generate new plans to recover from users’ mistakes if necessary. To verify the feasibility of our system, we implemented and tested it on both an Android phone and HoloLens 2, supported by an edge server for off-loading computation.