Change writerSetextHeaders in default to False. #6693

argent0 · 2020-09-21T16:09:11Z

Working on #6662

First step

argent0 · 2020-09-21T19:28:35Z

Hi, the tests for lhs to lhs fail in an unexpected way

lhs to lhs:              FAIL (0.02s)
          
          ------------------------------------------------------------------------
          --- lhs-test.markdown+lhs
          +++ /home/aner/source/git/pandoc/.stack-work/dist/x86_64-linux-tinfo6/Cabal-2.4.0.1/build/pandoc/pandoc --data-dir=../data --quiet lhs-test.native --wrap=preserve -r native -s -w markdown+lhs
          -   2 ========
          ------------------------------------------------------------------------

Where I would expect it to fail like:

+ 1 # lhs test
- 1 lhs test
- 2 ======

Somehow the # disapears. 🤔 Any Idea what might be happening in this case?

If I understand correctly this is reading the lhs-test.native file, converting it to markdown+lhs and comparing the result to lhs-test.markdown+lhs.

A test for the new --setext-headings would be necessary. Should I add it? (Any pointers?)
We are using "headings" in --setext-headings and headers in --atx-headers. Should we use one word, both in the code and the flags?
The code, after this patch, sets optSetextHeaders ("headers" used here) to False. Should the option be replaced to better reflect the new default? I.e. optUseAtxHeaders or something.

Regards

argent0 · 2020-09-23T16:24:22Z

The failed test:

ipynb
      reader:                                                        OK (0.02s)
      writer:                                                        FAIL (0.03s)                                                                                                                                                          
········
        ------------------------------------------------------------------------
        --- ipynb/simple.ipynb
        +++ /home/aner/source/git/pandoc/.stack-work/dist/x86_64-linux-tinfo6/Cabal-2.4.0.1/build/pandoc/pandoc --data-dir=../data --quiet ipynb/simple.in.native -f native -t ipynb-raw_html-raw_tex+raw_attribute -s
        +  26     "## Pyout"
        -  26     "## Pyout\n"
        ------------------------------------------------------------------------

is also unexpected. Why is there no \n at the end at this particular instance? 🤔

I think all the relevant tests should be duplicated to test both headings styles whenever markdown is involved.

jgm · 2020-09-23T23:09:51Z

is also unexpected. Why is there no \n at the end at this particular instance? 🤔

This is because in the ipynb output, the markdown is divided into lines, and the newline is omitted on the last line.
(I don't know why they do it that way, but that's how they do it.)

jgm · 2020-09-24T00:36:22Z

Somehow the # disapears. 🤔 Any Idea what might be happening in this case?

Could be a bug in the test setup? That's really puzzling. I can't look into it right now though.

A test for the new --setext-headings would be necessary. Should I add it? (Any pointers?)

Something in test/command would be the easiest.

We are using "headings" in --setext-headings and headers in --atx-headers. Should we use one word, both in the code and the flags?

We've been transitioning from "headers" to "headings," esp. in the public-facing parts (documentation, command-line options), but we've left "headers" in some places for backwards compatibility. So I'm thinking "headings" is better for the new option, even though it's a bit odd to have both. Note that after this change, the only command-line options with "headers" (in this sense) will be deprecated ones.

We could introduce --atx-headings as a synonym for --atx-headers, I suppose.

One question is whether --atx-header(ing)s should be deprecated, even if it's a no-op. Maybe it's helpful to have an option that makes intentions clear.

The code, after this patch, sets optSetextHeaders ("headers" used here) to False. Should the option be replaced to better reflect the new default? I.e. optUseAtxHeaders or something.

optSetextHeaders is left over from an earlier time when ATX headings were the default! In fact it makes more sense with this change than optUseAtxHeaders would.

Bigger question: Perhaps instead of --setext-headings and --atx-headers/headings, we shold have:

--markdown-headings=setext|atx

I like that better, as it makes it clearer what syntax this affects. If we did this, it would probably be good to change the Opt structure accordingly, and to make sure markdown-headings: setext works in defaults files.

argent0 · 2020-09-28T18:37:31Z

I'll see to:

implement --markdown-headings=setext|atx
remove --setext-headings
keep the deprecated --atx-headers and point to the new flag format

argent0 · 2020-09-28T23:47:16Z

I implemented the --markdown-headings=setext|atx flag. The default is atx style. It can be explicitly passed --markdown-headings=atx in case you want to future proof your script. The --atx-headers is deprecated and recommends using the new flag.

argent0 · 2020-10-05T21:02:42Z

About the failing ipynb reader test.

--- ipynb/simple.out.native
+++ /home/user/source/git/pandoc/.stack-work/dist/x86_64-linux-tinfo6/Cabal-2.4.0.1/build/pandoc/pandoc --data-dir=../data --quiet ipynb/simple.ipynb -f ipynb-raw_html-raw_tex+raw_attribute -t native -s

This pull request's version generates the Meta present in simple.ipynb. That meta is absent in test/ipynb/simple.out.native. So I think the test should be updated. I saw this with vimdiff. But here is the test's output diff anyway:

------------------------------------------------------------------------
        --- ipynb/simple.out.native
        +++ /home/user/source/git/pandoc/.stack-work/dist/x86_64-linux-tinfo6/Cabal-2.4.0.1/build/pandoc/pandoc --data-dir=../data --quiet ipynb/simple.ipynb -f ipynb-raw_html-raw_tex+raw_attribute -t native -s
        +   1 Pandoc (Meta {unMeta = fromList [("jupyter",MetaMap (fromList [("kernelspec",MetaMap (fromList [("display_name",MetaString "Python 3"),("language",MetaString "python"),("name",MetaString "python3")])),("language_info",MetaMap (fromList [("codemirror_mode",MetaMap (fromList [("name",MetaString "ipython"),("version",MetaString "3")])),("file_extension",MetaString ".py"),("mimetype",MetaString "text/x-python"),("name",MetaString "python"),("nbconvert_exporter",Met
aString "python"),("pygments_lexer",MetaString "ipython3"),("version",MetaString "3.8.5")])),("nbformat",MetaString "4"),("nbformat_minor",MetaString "5")]))]})
        -   1 Pandoc (Meta {unMeta = fromList [("jupyter",MetaMap (fromList [("nbformat",MetaString "4"),("nbformat_minor",MetaString "5")]))]})

For example check the kernelspec tag, which is absent in simple.out.native and present in test/ipynb/simple.ipynb.

About the failing ipynb writer test.

Considering src/Text/Pandoc/Writers/Ipynb.hs with the input:

Pandoc (Meta {unMeta = fromList [("jupyter",MetaMap (fromList [("nbformat",MetaInlines [Str "4"]),("nbformat_minor",MetaInlines [Str "5"])]))]})
[...

writeIpynb :: PandocMonad m => WriterOptions -> Pandoc -> m Text
writeIpynb opts d = do
  notebook <- pandocToNotebook opts d
  return $ {- redacted -}
         $ notebook

pandocToNotebook :: PandocMonad m
                 => WriterOptions -> Pandoc -> m (Notebook NbV4)
pandocToNotebook opts (Pandoc meta blocks) = do
  let blockWriter bs = literal <$> writeMarkdown
           opts{ writerTemplate = Nothing } (Pandoc nullMeta bs)
  let inlineWriter ils = literal . T.stripEnd <$> writeMarkdown
           opts{ writerTemplate = Nothing } (Pandoc nullMeta [Plain ils])

  -- This will find the jupyter tag so
  -- jupyterMeta = Meta (fromList [("nbformat",MetaInlines [Str "4"]),("nbformat_minor",MetaInlines [Str "5"])])
  let jupyterMeta =
        case lookupMeta "jupyter" meta of
          Just (MetaMap m) -> Meta m
          _ -> mempty


  -- With this input this will result in:
  -- nbformat = (4,5)
  let nbformat =
         case (lookupMeta "nbformat" jupyterMeta,
               lookupMeta "nbformat_minor" jupyterMeta) of
               (Just (MetaInlines [Str "4"]), Just (MetaInlines [Str y])) ->
                 case safeRead y of
                        Just z  -> (4, z)              -- <-- From this branch
                        Nothing -> (4, 5)
               _                -> (4, 5) -- write as v4.5


  -- The argument here has all its tag removed so I ends up beeing something like
  -- Meta (fromLIst [])
  metadata' <- toJSON <$> metaToContext' blockWriter inlineWriter
                 (B.deleteMeta "nbformat" .
                  B.deleteMeta "nbformat_minor" $
                  jupyterMeta)

  -- fromJson $ toJSON $ ... 
  -- to check for errors?
  -- convert from a Value (JSON object) to a M.Map Text Value:
  let metadata = case fromJSON metadata' of
                   Error _ -> mempty -- TODO warning here? shouldn't happen
                   Success x -> x

  cells <- extractCells opts blocks
  return $ Notebook{
       notebookMetadata = metadata       -- <-- This metadata is empty with this input
     , notebookFormat = nbformat
     , notebookCells = cells }

The final piece would be metaToContext but going through the calls I can't find a place where such information would be added. The ipynb format doesn't have a template.

$ pandoc -f markdown -t ipynb -s
{
 "cells": [],
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {}
}

A final argument is that the resulting file can be viewed with jupyter 4.6.3.

jgm · 2020-10-07T04:00:13Z

I'm confused about why a change of default from setext to atx headers in the markdown writer has any ramifications at all for metadata in ipynb. Can you explain this?

argent0 · 2020-10-07T20:21:27Z

I'm having multiple issues with this:

the ipynb writer
its reader

Issue 2: With the reader: this PR writes metadata present in the source; absent in the test. So maybe this was an issue "slipping through the cracks".

I'm confused about why a change of default from setext to atx headers in the markdown writer has any ramifications at all for metadata in ipynb. Can you explain this?

EDIT: I think I accidentally modified the gold standards while verifying the that results where valid jupyter notebooks using jupyter notebook.

Finally, to address your question:

pandocToNotebook = do
  ...
  let blockWriter bs = literal <$> writeMarkdown
           opts{ writerTemplate = Nothing } (Pandoc nullMeta bs)
  ...
  metadata' <- toJSON <$> metaToContext' blockWriter inlineWriter
                 (B.deleteMeta "nbformat" .
                  B.deleteMeta "nbformat_minor" $
                  jupyterMeta)

So metadata' depends on writeMarkdown
Then at pandocToMarkdown:

  metadata <- metaToContext'
               (blockListToMarkdown opts)
               (inlineListToMarkdown opts)
               meta

So it calls: blockListToMarkdown which calls blockToMarkdown where there are changes for the headings.

I'll look around those functions to see if I can spot anything.

argent0 · 2020-10-08T02:04:38Z

OK, so the metadata "changes" were introduced when I used jupyter notebook to check that I had valid output for ipynb. I think I should make the tests read only hereafter.

Tests

Some tests were changed to use the new default.

--markdown-headings=settext was added in other tests to recreate the old behavior.

markdown+lhs

New behavior (RFC)

The markdown output for markdown+lhs is:

stack run pandoc -- --data-dir=../data --quiet --wrap=preserve -r native test/lhs-test.native -s -w markdown+lhs
lhs test

`unsplit` is an arrow that takes a pair of values and combines them to
return a single value:

> unsplit :: (Arrow a) => (b -> c -> d) -> a (b, c) d
> unsplit = arr . uncurry
>           -- arr (\op (x,y) -> x `op` y)

`(***)` combines two arrows into a new arrow by running the two arrows on a
pair of values (one arrow on the first item of the pair and one arrow on the
second item of the pair).

    f *** g = first f >>> second g

Block quote:

 > foo bar

That is: There are no headings. This is the original behavior. I added a --markdown-headings=setext to the test. AFAIK markdown headings can't have leading spaces and # in the first column of a lhs causes problems:

7 15051 0 tmp $ stack ghc Main.lhs

Main.lhs:3:3: error: lexical error in pragma at character 'T'
  |
3 | # This is the police
  |   ^
8 15052 0 tmp $ cat Main.lhs 
> module Main where

# This is the police

> main = undefined

Original Behavior

Using the new flag:

$ stack run pandoc -- --data-dir=../data --quiet --wrap=preserve -r native test/lhs-test.native -s --markdown-headings=setext -w markdown+lhs
lhs test
========

`unsplit` is an arrow that takes a pair of values and combines them to
return a single value:

> unsplit :: (Arrow a) => (b -> c -> d) -> a (b, c) d
> unsplit = arr . uncurry
>           -- arr (\op (x,y) -> x `op` y)

`(***)` combines two arrows into a new arrow by running the two arrows on a
pair of values (one arrow on the first item of the pair and one arrow on the
second item of the pair).

    f *** g = first f >>> second g

Block quote:

 > foo bar

works as before.

Perhaps the combination of markdown+lhs and atx headings should result in an error. Because the resulting file doesn't have atx markdown headings.

Let me know what you think.

jgm · 2020-10-08T16:58:28Z

Perhaps the combination of markdown+lhs and atx headings should result in an error.

That's a good thought. Maybe it would make more sense to issue a warning in the markdown writer when we "downshift" an ATX heading to a paragraph. Something like: "Rendering ATX heading as a paragraph, because # in the left-hand column is interpreted as CPP in lhs output. Consider using --markdown-headings=setext" If they don't have any headings, they won't get the warning and things will be fine. IF they do, they can react as they like.

jgm · 2020-10-08T16:58:45Z

This would require adding a new constructor to LogMessage in Logging.

argent0 · 2020-10-09T02:20:33Z

Details

I added the constructor

-- UnaccurateRepresentation block format message
UnaccurateRepresentation Block Text.Text Text.Text

That means that Block can't be represented accurately using format with a final message.

Resulting Warning

$ stack run pandoc -- --data-dir=../data --wrap=preserve -r native test/lhs-test.native -s -w markdown+lhs
[WARNING] Format markdown+lhs can't accurately represent `Header 1 ("lhs-test",[],[]) [Str "lhs",Space,Str "test"]`
  Leading # is used for CPP pragams. Rendering as paragraph. Consider using --markdown-headings=setext.
lhs test

`unsplit` is an arrow that takes a pair of values and combines them to
return a single value:

> unsplit :: (Arrow a) => (b -> c -> d) -> a (b, c) d
> unsplit = arr . uncurry
>           -- arr (\op (x,y) -> x `op` y)

`(***)` combines two arrows into a new arrow by running the two arrows on a
pair of values (one arrow on the first item of the pair and one arrow on the
second item of the pair).

    f *** g = first f >>> second g

Block quote:

 > foo bar

If everything is fine I can write the documentation.

jgm · 2020-10-09T04:14:49Z

I'd prefer the constructor to be more targeted and the output simpler:

CannotRenderASTHeading

Render as: "Rendering heading as paragraph, because # is not allowed in column 1 of lhs files.\nConsider using --markdown-headings=setext to force setext-style headings."

argent0 · 2020-10-20T00:29:44Z

Hi, I've changed the constructors and error messages as requested. Is this change still going to be implemented? Please don't hesitate to mention any issue with the code.

Ping (1/3)

jgm · 2020-10-20T05:18:56Z

Yes! I do plan to merge this. But I'm holding off a bit, because this is an API change that would require a major version bump, and I want to make sure we don't need more point releases for 2.11 first.

I'll make a couple very minor comments, but this is looking good.

src/Text/Pandoc/App/CommandLineOptions.hs

jgm · 2020-10-20T05:20:14Z

src/Text/Pandoc/App/CommandLineOptions.hs

+                        "setext" -> pure HeadingFormatSetext
+                        "atx" -> pure HeadingFormatAtx
+                        _ -> E.throwIO $ PandocOptionError $ T.pack
+                          ("Unknown markdown heading format: " ++ arg)


Add: "expecting setext or atx"?

I used your proposed message. Another one I thought was: "Valid options are: setext and atx". Both messages convey the same information.

jgm · 2020-10-20T05:20:57Z

src/Text/Pandoc/App/Opt.hs

    , optSlideLevel            :: Maybe Int  -- ^ Header level that creates slides
-    , optSetextHeaders         :: Bool       -- ^ Use atx headers for markdown level 1-2
+    -- | Heading style for markdown level 1-2
+    , optMarkdownHeadingFormat :: MarkdownHeadingFormat


Instead of the comment on line 134, put a comment on line 135 using -- ^ as above.

Done. I was thinking about the 80 char line width guideline. Also I was trying to avoid modifying the spacing before the '::'s.

I'm sorry to reverse myself on this, but on reflection, I think that it's a bad idea to break the public API for aesthetic reasons. Let's just keep optSetextHeaders and use that. Everything else can stay the same, and then we can integrate this change without a breaking API change.

Also, is it possible to rebase your changes on master so that they are easier to integrate, and possibly collapse them into a small number of commits?

I think that it's a bad idea to break the public API for aesthetic reasons. Let's just keep optSetextHeaders and use that.

I'm not sure I understand what you mean.

In this patch

atx-headings are the new default markdown headings.

--atx-headers is deprecated.

--markdown-headings=setext|atx is the new way of controlling the markdown headings.

Did you mean to revert any of those or keeping those but implement them using optSetextHeaders?

I added the optMarkdownHeadingFormat to better reflect the new flag --markdown-headings and the yaml config in the implementation. So if one were to ask: "How does --markdown-headings=setext|atx works?" the answer is easier(in my opinion) to find. I agree it's a big departure from the previous API (optSetextHeaders is now optMarkdownHeadingFormat == HeadingFormatSetext).

Also, is it possible to rebase your changes on master so that they are easier to integrate, and possibly collapse them
into a small number of commits?

I'll see if I can to collapse some commits on test cases.

I guess what I had in mind was avoiding the switch from optSetextHeaders to optMarkdownHeadingFormat. That change would require a bump to 2.12.
I'm actually sort of conflicted on this. I'm not ready to go to 2.12 yet, but otherwise we could merge this change. On the other hand, the change does improve clarity. (There may also be ramifications for the JSON and YAML instances that I'm not thinking of.)

In any case, collapsing to a small number of commits makes it easier to rebase in the future.

I can't remember whether your changes included also a change to writerSetextHeaders in WriterOptions.

Seems to me that if we're going to make a big API change, we might consider changing that as well as optSetextHeaders.

But maybe the principle of avoiding API changes except when necessary favors just leaving these types as they are?

Very well, I'll keep optSetextHeaders and make --markdown-headings=setext|atx modify it accordingly.

src/Text/Pandoc/Logging.hs

src/Text/Pandoc/Options.hs

jgm · 2020-10-20T05:29:47Z

Note: any changes in the public-facing API should be marked as "[API change]" in the commit messages. This would include new constructors for exported types, new fields in exported record types, etc. If this isn't marked in commit messages, I may forget to note it in the changelog.

argent0 marked this pull request as draft September 28, 2020 23:13

argent0 marked this pull request as ready for review September 28, 2020 23:47

argent0 marked this pull request as draft October 1, 2020 14:08

argent0 marked this pull request as ready for review October 8, 2020 01:19

argent0 mentioned this pull request Oct 13, 2020

Difference in gfm output in Pandoc 2.11 #6740

Closed

jgm reviewed Oct 20, 2020

View reviewed changes

src/Text/Pandoc/App/CommandLineOptions.hs Outdated Show resolved Hide resolved

jgm reviewed Oct 20, 2020

View reviewed changes

src/Text/Pandoc/Logging.hs Outdated Show resolved Hide resolved

jgm reviewed Oct 20, 2020

View reviewed changes

src/Text/Pandoc/Logging.hs Outdated Show resolved Hide resolved

jgm reviewed Oct 20, 2020

View reviewed changes

src/Text/Pandoc/Options.hs Outdated Show resolved Hide resolved

argent0 marked this pull request as draft November 14, 2020 19:54

argent0 closed this Nov 14, 2020

argent0 mentioned this pull request Nov 14, 2020

[API change] Atx headings new default. #6837

Closed

Uh oh!

Change writerSetextHeaders in default to False. #6693

Change writerSetextHeaders in default to False. #6693

Uh oh!

Conversation

argent0 commented Sep 21, 2020

Uh oh!

argent0 commented Sep 21, 2020

Uh oh!

argent0 commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgm commented Sep 23, 2020

Uh oh!

jgm commented Sep 24, 2020

Uh oh!

argent0 commented Sep 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

argent0 commented Sep 28, 2020

Uh oh!

argent0 commented Oct 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgm commented Oct 7, 2020

Uh oh!

argent0 commented Oct 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

argent0 commented Oct 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

markdown+lhs

New behavior (RFC)

Original Behavior

Uh oh!

jgm commented Oct 8, 2020

Uh oh!

jgm commented Oct 8, 2020

Uh oh!

argent0 commented Oct 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Resulting Warning

Uh oh!

jgm commented Oct 9, 2020

Uh oh!

argent0 commented Oct 20, 2020

Uh oh!

jgm commented Oct 20, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

argent0 Nov 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

argent0 Nov 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jgm commented Oct 20, 2020

argent0 commented Sep 23, 2020 •

edited

Loading

argent0 commented Sep 28, 2020 •

edited

Loading

argent0 commented Oct 5, 2020 •

edited

Loading

argent0 commented Oct 7, 2020 •

edited

Loading

argent0 commented Oct 8, 2020 •

edited

Loading

argent0 commented Oct 9, 2020 •

edited

Loading

argent0 Nov 6, 2020 •

edited

Loading

argent0 Nov 12, 2020 •

edited

Loading