Skip to content

Conversation

@argent0
Copy link
Contributor

@argent0 argent0 commented Sep 21, 2020

Working on #6662

First step

@argent0
Copy link
Contributor Author

argent0 commented Sep 21, 2020

Hi, the tests for lhs to lhs fail in an unexpected way

lhs to lhs:              FAIL (0.02s)
          
          ------------------------------------------------------------------------
          --- lhs-test.markdown+lhs
          +++ /home/aner/source/git/pandoc/.stack-work/dist/x86_64-linux-tinfo6/Cabal-2.4.0.1/build/pandoc/pandoc --data-dir=../data --quiet lhs-test.native --wrap=preserve -r native -s -w markdown+lhs
          -   2 ========
          ------------------------------------------------------------------------

Where I would expect it to fail like:

+ 1 # lhs test
- 1 lhs test
- 2 ======

Somehow the # disapears. 🤔 Any Idea what might be happening in this case?

If I understand correctly this is reading the lhs-test.native file, converting it to markdown+lhs and comparing the result to lhs-test.markdown+lhs.

  • A test for the new --setext-headings would be necessary. Should I add it? (Any pointers?)
  • We are using "headings" in --setext-headings and headers in --atx-headers. Should we use one word, both in the code and the flags?
  • The code, after this patch, sets optSetextHeaders ("headers" used here) to False. Should the option be replaced to better reflect the new default? I.e. optUseAtxHeaders or something.

Regards

@argent0
Copy link
Contributor Author

argent0 commented Sep 23, 2020

The failed test:

ipynb
      reader:                                                        OK (0.02s)
      writer:                                                        FAIL (0.03s)                                                                                                                                                          
········
        ------------------------------------------------------------------------
        --- ipynb/simple.ipynb
        +++ /home/aner/source/git/pandoc/.stack-work/dist/x86_64-linux-tinfo6/Cabal-2.4.0.1/build/pandoc/pandoc --data-dir=../data --quiet ipynb/simple.in.native -f native -t ipynb-raw_html-raw_tex+raw_attribute -s
        +  26     "## Pyout"
        -  26     "## Pyout\n"
        ------------------------------------------------------------------------

is also unexpected. Why is there no \n at the end at this particular instance? 🤔

I think all the relevant tests should be duplicated to test both headings styles whenever markdown is involved.

@jgm
Copy link
Owner

jgm commented Sep 23, 2020

is also unexpected. Why is there no \n at the end at this particular instance? 🤔

This is because in the ipynb output, the markdown is divided into lines, and the newline is omitted on the last line.
(I don't know why they do it that way, but that's how they do it.)

@jgm
Copy link
Owner

jgm commented Sep 24, 2020

Somehow the # disapears. 🤔 Any Idea what might be happening in this case?

Could be a bug in the test setup? That's really puzzling. I can't look into it right now though.

A test for the new --setext-headings would be necessary. Should I add it? (Any pointers?)

Something in test/command would be the easiest.

We are using "headings" in --setext-headings and headers in --atx-headers. Should we use one word, both in the code and the flags?

We've been transitioning from "headers" to "headings," esp. in the public-facing parts (documentation, command-line options), but we've left "headers" in some places for backwards compatibility. So I'm thinking "headings" is better for the new option, even though it's a bit odd to have both. Note that after this change, the only command-line options with "headers" (in this sense) will be deprecated ones.

We could introduce --atx-headings as a synonym for --atx-headers, I suppose.

One question is whether --atx-header(ing)s should be deprecated, even if it's a no-op. Maybe it's helpful to have an option that makes intentions clear.

The code, after this patch, sets optSetextHeaders ("headers" used here) to False. Should the option be replaced to better reflect the new default? I.e. optUseAtxHeaders or something.

optSetextHeaders is left over from an earlier time when ATX headings were the default! In fact it makes more sense with this change than optUseAtxHeaders would.

Bigger question: Perhaps instead of --setext-headings and --atx-headers/headings, we shold have:

--markdown-headings=setext|atx

I like that better, as it makes it clearer what syntax this affects. If we did this, it would probably be good to change the Opt structure accordingly, and to make sure markdown-headings: setext works in defaults files.

@argent0
Copy link
Contributor Author

argent0 commented Sep 28, 2020

I'll see to:

  • implement --markdown-headings=setext|atx
  • remove --setext-headings
  • keep the deprecated --atx-headers and point to the new flag format

@argent0 argent0 marked this pull request as draft September 28, 2020 23:13
@argent0
Copy link
Contributor Author

argent0 commented Sep 28, 2020

I implemented the --markdown-headings=setext|atx flag. The default is atx style. It can be explicitly passed --markdown-headings=atx in case you want to future proof your script. The --atx-headers is deprecated and recommends using the new flag.

@argent0 argent0 marked this pull request as ready for review September 28, 2020 23:47
@argent0 argent0 marked this pull request as draft October 1, 2020 14:08
@argent0
Copy link
Contributor Author

argent0 commented Oct 5, 2020

About the failing ipynb reader test.

--- ipynb/simple.out.native
+++ /home/user/source/git/pandoc/.stack-work/dist/x86_64-linux-tinfo6/Cabal-2.4.0.1/build/pandoc/pandoc --data-dir=../data --quiet ipynb/simple.ipynb -f ipynb-raw_html-raw_tex+raw_attribute -t native -s

This pull request's version generates the Meta present in simple.ipynb. That meta is absent in test/ipynb/simple.out.native. So I think the test should be updated. I saw this with vimdiff. But here is the test's output diff anyway:

------------------------------------------------------------------------
        --- ipynb/simple.out.native
        +++ /home/user/source/git/pandoc/.stack-work/dist/x86_64-linux-tinfo6/Cabal-2.4.0.1/build/pandoc/pandoc --data-dir=../data --quiet ipynb/simple.ipynb -f ipynb-raw_html-raw_tex+raw_attribute -t native -s
        +   1 Pandoc (Meta {unMeta = fromList [("jupyter",MetaMap (fromList [("kernelspec",MetaMap (fromList [("display_name",MetaString "Python 3"),("language",MetaString "python"),("name",MetaString "python3")])),("language_info",MetaMap (fromList [("codemirror_mode",MetaMap (fromList [("name",MetaString "ipython"),("version",MetaString "3")])),("file_extension",MetaString ".py"),("mimetype",MetaString "text/x-python"),("name",MetaString "python"),("nbconvert_exporter",Met
aString "python"),("pygments_lexer",MetaString "ipython3"),("version",MetaString "3.8.5")])),("nbformat",MetaString "4"),("nbformat_minor",MetaString "5")]))]})
        -   1 Pandoc (Meta {unMeta = fromList [("jupyter",MetaMap (fromList [("nbformat",MetaString "4"),("nbformat_minor",MetaString "5")]))]})

For example check the kernelspec tag, which is absent in simple.out.native and present in test/ipynb/simple.ipynb.

About the failing ipynb writer test.

Considering src/Text/Pandoc/Writers/Ipynb.hs with the input:

Pandoc (Meta {unMeta = fromList [("jupyter",MetaMap (fromList [("nbformat",MetaInlines [Str "4"]),("nbformat_minor",MetaInlines [Str "5"])]))]})
[...
writeIpynb :: PandocMonad m => WriterOptions -> Pandoc -> m Text
writeIpynb opts d = do
  notebook <- pandocToNotebook opts d
  return $ {- redacted -}
         $ notebook

pandocToNotebook :: PandocMonad m
                 => WriterOptions -> Pandoc -> m (Notebook NbV4)
pandocToNotebook opts (Pandoc meta blocks) = do
  let blockWriter bs = literal <$> writeMarkdown
           opts{ writerTemplate = Nothing } (Pandoc nullMeta bs)
  let inlineWriter ils = literal . T.stripEnd <$> writeMarkdown
           opts{ writerTemplate = Nothing } (Pandoc nullMeta [Plain ils])

  -- This will find the jupyter tag so
  -- jupyterMeta = Meta (fromList [("nbformat",MetaInlines [Str "4"]),("nbformat_minor",MetaInlines [Str "5"])])
  let jupyterMeta =
        case lookupMeta "jupyter" meta of
          Just (MetaMap m) -> Meta m
          _ -> mempty


  -- With this input this will result in:
  -- nbformat = (4,5)
  let nbformat =
         case (lookupMeta "nbformat" jupyterMeta,
               lookupMeta "nbformat_minor" jupyterMeta) of
               (Just (MetaInlines [Str "4"]), Just (MetaInlines [Str y])) ->
                 case safeRead y of
                        Just z  -> (4, z)              -- <-- From this branch
                        Nothing -> (4, 5)
               _                -> (4, 5) -- write as v4.5


  -- The argument here has all its tag removed so I ends up beeing something like
  -- Meta (fromLIst [])
  metadata' <- toJSON <$> metaToContext' blockWriter inlineWriter
                 (B.deleteMeta "nbformat" .
                  B.deleteMeta "nbformat_minor" $
                  jupyterMeta)

  -- fromJson $ toJSON $ ... 
  -- to check for errors?
  -- convert from a Value (JSON object) to a M.Map Text Value:
  let metadata = case fromJSON metadata' of
                   Error _ -> mempty -- TODO warning here? shouldn't happen
                   Success x -> x

  cells <- extractCells opts blocks
  return $ Notebook{
       notebookMetadata = metadata       -- <-- This metadata is empty with this input
     , notebookFormat = nbformat
     , notebookCells = cells }

The final piece would be metaToContext but going through the calls I can't find a place where such information would be added. The ipynb format doesn't have a template.

$ pandoc -f markdown -t ipynb -s
{
 "cells": [],
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {}
}

A final argument is that the resulting file can be viewed with jupyter 4.6.3.

@jgm
Copy link
Owner

jgm commented Oct 7, 2020

I'm confused about why a change of default from setext to atx headers in the markdown writer has any ramifications at all for metadata in ipynb. Can you explain this?

@argent0
Copy link
Contributor Author

argent0 commented Oct 7, 2020

I'm having multiple issues with this:

  1. the ipynb writer
  2. its reader

Issue 2: With the reader: this PR writes metadata present in the source; absent in the test. So maybe this was an issue "slipping through the cracks".

I'm confused about why a change of default from setext to atx headers in the markdown writer has any ramifications at all for metadata in ipynb. Can you explain this?

EDIT: I think I accidentally modified the gold standards while verifying the that results where valid jupyter notebooks using jupyter notebook.

Finally, to address your question:

pandocToNotebook = do
  ...
  let blockWriter bs = literal <$> writeMarkdown
           opts{ writerTemplate = Nothing } (Pandoc nullMeta bs)
  ...
  metadata' <- toJSON <$> metaToContext' blockWriter inlineWriter
                 (B.deleteMeta "nbformat" .
                  B.deleteMeta "nbformat_minor" $
                  jupyterMeta)

So metadata' depends on writeMarkdown
Then at pandocToMarkdown:

  metadata <- metaToContext'
               (blockListToMarkdown opts)
               (inlineListToMarkdown opts)
               meta

So it calls: blockListToMarkdown which calls blockToMarkdown where there are changes for the headings.

I'll look around those functions to see if I can spot anything.

@argent0 argent0 marked this pull request as ready for review October 8, 2020 01:19
@argent0
Copy link
Contributor Author

argent0 commented Oct 8, 2020

OK, so the metadata "changes" were introduced when I used jupyter notebook to check that I had valid output for ipynb. I think I should make the tests read only hereafter.

Tests

Some tests were changed to use the new default.

--markdown-headings=settext was added in other tests to recreate the old behavior.

markdown+lhs

New behavior (RFC)

The markdown output for markdown+lhs is:

stack run pandoc -- --data-dir=../data --quiet --wrap=preserve -r native test/lhs-test.native -s -w markdown+lhs
lhs test

`unsplit` is an arrow that takes a pair of values and combines them to
return a single value:

> unsplit :: (Arrow a) => (b -> c -> d) -> a (b, c) d
> unsplit = arr . uncurry
>           -- arr (\op (x,y) -> x `op` y)

`(***)` combines two arrows into a new arrow by running the two arrows on a
pair of values (one arrow on the first item of the pair and one arrow on the
second item of the pair).

    f *** g = first f >>> second g

Block quote:

 > foo bar

That is: There are no headings. This is the original behavior. I added a --markdown-headings=setext to the test. AFAIK markdown headings can't have leading spaces and # in the first column of a lhs causes problems:

7 15051 0 tmp $ stack ghc Main.lhs

Main.lhs:3:3: error: lexical error in pragma at character 'T'
  |
3 | # This is the police
  |   ^
8 15052 0 tmp $ cat Main.lhs 
> module Main where

# This is the police

> main = undefined

Original Behavior

Using the new flag:

$ stack run pandoc -- --data-dir=../data --quiet --wrap=preserve -r native test/lhs-test.native -s --markdown-headings=setext -w markdown+lhs
lhs test
========

`unsplit` is an arrow that takes a pair of values and combines them to
return a single value:

> unsplit :: (Arrow a) => (b -> c -> d) -> a (b, c) d
> unsplit = arr . uncurry
>           -- arr (\op (x,y) -> x `op` y)

`(***)` combines two arrows into a new arrow by running the two arrows on a
pair of values (one arrow on the first item of the pair and one arrow on the
second item of the pair).

    f *** g = first f >>> second g

Block quote:

 > foo bar

works as before.

Perhaps the combination of markdown+lhs and atx headings should result in an error. Because the resulting file doesn't have atx markdown headings.

Let me know what you think.

@jgm
Copy link
Owner

jgm commented Oct 8, 2020

Perhaps the combination of markdown+lhs and atx headings should result in an error.

That's a good thought. Maybe it would make more sense to issue a warning in the markdown writer when we "downshift" an ATX heading to a paragraph. Something like: "Rendering ATX heading as a paragraph, because # in the left-hand column is interpreted as CPP in lhs output. Consider using --markdown-headings=setext" If they don't have any headings, they won't get the warning and things will be fine. IF they do, they can react as they like.

@jgm
Copy link
Owner

jgm commented Oct 8, 2020

This would require adding a new constructor to LogMessage in Logging.

@argent0
Copy link
Contributor Author

argent0 commented Oct 9, 2020

Details

I added the constructor

-- UnaccurateRepresentation block format message
UnaccurateRepresentation Block Text.Text Text.Text

That means that Block can't be represented accurately using format with a final message.

Resulting Warning

$ stack run pandoc -- --data-dir=../data --wrap=preserve -r native test/lhs-test.native -s -w markdown+lhs
[WARNING] Format markdown+lhs can't accurately represent `Header 1 ("lhs-test",[],[]) [Str "lhs",Space,Str "test"]`
  Leading # is used for CPP pragams. Rendering as paragraph. Consider using --markdown-headings=setext.
lhs test

`unsplit` is an arrow that takes a pair of values and combines them to
return a single value:

> unsplit :: (Arrow a) => (b -> c -> d) -> a (b, c) d
> unsplit = arr . uncurry
>           -- arr (\op (x,y) -> x `op` y)

`(***)` combines two arrows into a new arrow by running the two arrows on a
pair of values (one arrow on the first item of the pair and one arrow on the
second item of the pair).

    f *** g = first f >>> second g

Block quote:

 > foo bar

If everything is fine I can write the documentation.

@jgm
Copy link
Owner

jgm commented Oct 9, 2020

I'd prefer the constructor to be more targeted and the output simpler:

CannotRenderASTHeading

Render as: "Rendering heading as paragraph, because # is not allowed in column 1 of lhs files.\nConsider using --markdown-headings=setext to force setext-style headings."

@argent0
Copy link
Contributor Author

argent0 commented Oct 20, 2020

Hi, I've changed the constructors and error messages as requested. Is this change still going to be implemented? Please don't hesitate to mention any issue with the code.

Ping (1/3)

@jgm
Copy link
Owner

jgm commented Oct 20, 2020

Yes! I do plan to merge this. But I'm holding off a bit, because this is an API change that would require a major version bump, and I want to make sure we don't need more point releases for 2.11 first.

I'll make a couple very minor comments, but this is looking good.

"setext" -> pure HeadingFormatSetext
"atx" -> pure HeadingFormatAtx
_ -> E.throwIO $ PandocOptionError $ T.pack
("Unknown markdown heading format: " ++ arg)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: "expecting setext or atx"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used your proposed message. Another one I thought was: "Valid options are: setext and atx". Both messages convey the same information.

, optSlideLevel :: Maybe Int -- ^ Header level that creates slides
, optSetextHeaders :: Bool -- ^ Use atx headers for markdown level 1-2
-- | Heading style for markdown level 1-2
, optMarkdownHeadingFormat :: MarkdownHeadingFormat
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the comment on line 134, put a comment on line 135 using -- ^ as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I was thinking about the 80 char line width guideline. Also I was trying to avoid modifying the spacing before the '::'s.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to reverse myself on this, but on reflection, I think that it's a bad idea to break the public API for aesthetic reasons. Let's just keep optSetextHeaders and use that. Everything else can stay the same, and then we can integrate this change without a breaking API change.

Also, is it possible to rebase your changes on master so that they are easier to integrate, and possibly collapse them into a small number of commits?

Copy link
Contributor Author

@argent0 argent0 Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's a bad idea to break the public API for aesthetic reasons. Let's just keep optSetextHeaders and use that.

I'm not sure I understand what you mean.

In this patch

  1. atx-headings are the new default markdown headings.
  2. --atx-headers is deprecated.
  3. --markdown-headings=setext|atx is the new way of controlling the markdown headings.

Did you mean to revert any of those or keeping those but implement them using optSetextHeaders?

I added the optMarkdownHeadingFormat to better reflect the new flag --markdown-headings and the yaml config in the implementation. So if one were to ask: "How does --markdown-headings=setext|atx works?" the answer is easier(in my opinion) to find. I agree it's a big departure from the previous API (optSetextHeaders is now optMarkdownHeadingFormat == HeadingFormatSetext).

Also, is it possible to rebase your changes on master so that they are easier to integrate, and possibly collapse them
into a small number of commits?

I'll see if I can to collapse some commits on test cases.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I had in mind was avoiding the switch from optSetextHeaders to optMarkdownHeadingFormat. That change would require a bump to 2.12.
I'm actually sort of conflicted on this. I'm not ready to go to 2.12 yet, but otherwise we could merge this change. On the other hand, the change does improve clarity. (There may also be ramifications for the JSON and YAML instances that I'm not thinking of.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, collapsing to a small number of commits makes it easier to rebase in the future.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember whether your changes included also a change to writerSetextHeaders in WriterOptions.

Seems to me that if we're going to make a big API change, we might consider changing that as well as optSetextHeaders.

But maybe the principle of avoiding API changes except when necessary favors just leaving these types as they are?

Copy link
Contributor Author

@argent0 argent0 Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well, I'll keep optSetextHeaders and make --markdown-headings=setext|atx modify it accordingly.

@jgm
Copy link
Owner

jgm commented Oct 20, 2020

Note: any changes in the public-facing API should be marked as "[API change]" in the commit messages. This would include new constructors for exported types, new fields in exported record types, etc. If this isn't marked in commit messages, I may forget to note it in the changelog.

@argent0 argent0 marked this pull request as draft November 14, 2020 19:54
@argent0 argent0 closed this Nov 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants