-
Notifications
You must be signed in to change notification settings - Fork 435
Fix skipping parsing character #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@seaburg, thank you for your PR. We love those ;=)) While your commit 7d28b21 certainly addresses an issue of fixing the bad html of Now I have tried to reconstruct that old 762102, now sf 486, created 2003-06-27, by Igor Katraev, but I have not been very successful... not sure exactly what the input was supposed to be... too many 'escapes'! Your fix certainly seems to produce a better output in the case of a duplicated leading And maybe you have good reason why this old fix code was removed... in which case your PR is good... and will be merged... after testing... Please explain... |
|
Bug 762102 is solution of special case of the problem ( Example:
|
|
Proposed solution seems to be more general fix for aforementioned https://sourceforge.net/p/tidy/bugs/486/ |
|
@seaburg, @smirn0v, yes I can see this is a more general change for every Hopefully, if I get the time, in just a few more days... |
|
@seaburg, @smirn0v, ok after some more testing, I am re-thinking this... Really, all we want here is to put back certain characters following the Previously we only put back an And this issue specifically identifies also if the second character is another But should every next non-letter be put back in the stream to be re-assessed? The state is already being changed to So, at this point I am leaning towards a simpler fix of the current Are there any other 2nd non-letter characters that need to be re-assessed? Seek, and appreciate, more comment and testing on this... If we can resolve this quickly, it can make it into 5.2, otherwise I will move the milestone to 5.3, since we hope to shortly issue a new 5.2 release... thanks... |
No, this is not enough. In this case necessary But a excess condition complicate the logic. Therefore it is better to put any non-letter character back to buffer? |
|
@seaburg yes, on tracing through it all again, and again, I am coming around to agreeing ;=)) It does seem better to put it back, back up the lexer, and let Nearly out of time today, but should be able to get around to merging this tomorrow... thanks for hanging in there... |
|
@seaburg now merged... thanks... add comments, and bumped version to 5.1.49... |
Hi!
Example:
<<a href="https://github.com/htacg/tidy-html5">tidy-html5</a>>After an unsuccessful attempt to parse
<<parsing continues withacharacter (instead of<)Оutput:
<<a href="https://github.com/htacg/tidy-html5">tidy-html5</a>>Instead of:
<<a href="https://github.com/htacg/tidy-html5">tidy-html5</a>>