Deceptive Behavior by Kate Dolan

5/10/2023

AI systems may also develop unwanted instrumental strategies such as seeking power or survival because this helps them achieve their given goals. AI systems may find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful ways ( reward hacking). Misaligned AI systems can malfunction or cause harm.

Therefore, AI designers typically use easier-to-specify proxy goals that may omit some desired constraints or leave other loopholes. ĪI systems can be challenging to align as it can be difficult for AI designers to specify the full range of desired and undesired behaviors. A misaligned AI system is competent at advancing some objectives, but not the intended ones. An AI system is considered aligned if it advances the intended objectives. In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans’ intended goals, preferences, or ethical principles.

0 Comments

Deceptive Behavior by Kate Dolan

Leave a Reply.

Author

Archives

Categories