There is a simple behavioral test that would provide significant evidence about whether AIs with a given rough set of characteristics develop subversive goals.
Share this post
Testing for Scheming with Model Deletion
Share this post
There is a simple behavioral test that would provide significant evidence about whether AIs with a given rough set of characteristics develop subversive goals.