Robot assembly discovery is a challenging problem that lives at the intersection of resource allocation and motion planning. The goal is to combine a predefined set of objects to form something new while considering task execution with the robot-in-the-loop. In this work, we tackle the problem of building arbitrary, predefined target structures entirely from scratch using a set of Tetris-like building blocks and a robot. Our novel hierarchical approach aims at efficiently decomposing the overall task into three feasible levels that benefit mutually from each other. On the high level, we run a classical mixed-integer program for global optimization of blocktype selection and the blocks’ final poses to recreate the desired shape. Its output is then exploited as a prior to efficiently guide the exploration of an underlying reinforcement learning (RL) policy handling decisions regarding structural stability and robotic feasibility. This RL policy draws its generalization properties from a flexible graph-based neural network that is learned through Q-learning and can be refined with search. Lastly, a grasp and motion planner transforms the desired assembly commands into robot joint movements. We demonstrate our proposed method’s performance on a set of competitive simulated and real-world robot assembly discovery environments and report performance and robustness gains compared to an unstructured graph-based end-to-end approach.