The best practices are changing. Many accessibility features were built due to the computer not being understand correctly. For example how something that looks like a checkbox despite being just a div is would not get recognized properly. Now with AI, the AI understands what a checkbox is and can understand from the styling that there is a checkbox there.
That's a huge resource cost though, and simply unnecessary. We should be building semantically valid HTML from the beginning rather than leaning on a GPU cluster to parse the function based on the entire HTML, CSS, and JS on the page (or a screenshot requiring image parsing by a word predictor).
We should try to avoid hitting that resource cost on every use where possible though. A CLI tool should have good `--help` docs for example, rather than expecting every inference run to scrub the CLI tool's source code to figure out how to use it.
That's already possible today yet there are still people who don't which is why a more general solution for the screen reader is needed rather than requiring every site developer to do something special.
We shouldn't create general solutions for people building software poorly. We should help people build software better, in this by helping to promote the use of a11y specs.
This is actually exactly where model providers could be doing some good. If they said a11y is the way for LLMs to interact with the web and helped push developers to docs, tutorials, etc the web would be better off. Google did effectively just that with HTTPS, they told everyone use it or lose SEO value rather than slapping some solution on Google's end to paper over poor security practices.