An out of date dagent was the last thing support mentioned before closing the ticket. This makes sense now because when I retry the task and it completes successfully the dagent has updated itself. Below is what they said:
If you have a 3.0 version dagent included in your image, then as soon as it is booted into the OS after deploy, the first thing it would do is contact the server. If the server detects an incorrect version, the first thing it does is update the dagent so communication can continue. For most people that isn't an issue. it generally finishes before it does the join domain. It is possible for some reason the join domain part is running while the server is updating the dagent. If it takes a fraction of a second longer in your environment than we expect, it may derail communication to join the domain. If this is the case, then the only thing to guarantee it will work every time the first time is to get new images with the 3.1 dagent included.
So this sounds like the issue is the dagent. All of our images have been created with the old agent on them. To rebuild every image with a new agent is going to take some time. Sounds like a poor design that the old agent can't talk to the GSS Server and complete a basic task.
I wonder if there would be a way to force the agent update earlier or manually add the new agent to the existing image.