Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode symbols in Markdown #90

Open
dsevastianov opened this issue Jan 5, 2014 · 10 comments
Open

Unicode symbols in Markdown #90

dsevastianov opened this issue Jan 5, 2014 · 10 comments

Comments

@dsevastianov
Copy link

Unicode html tags like α α don't seem to work in .fsx or Markdown comments in code. Work-around with latex syntax is pretty simple though.

@DavidSSL
Copy link

DavidSSL commented Apr 7, 2023

I cannot quite replicate this problem.

I've tried this:

(**
> You can use quotes α α
*)

as well as:

# Home Page

α α This file will load as the root of your docs

and both display fine

image

image

Can someone else reproduce the problem?

@nhirschey
Copy link
Collaborator

Maybe the fix was #464?

does that look like it’d fix it @DavidSSL?

@DavidSSL
Copy link

DavidSSL commented Apr 7, 2023

@nhirschey , it is difficult to tell 100% if that fixed it because it's not clear what the original problem in this ticket was.

However, the fix in #464 would, I believe, have had an impact on the bug in this ticket and the fact that I can't reproduce it in as in the examples I've provided would appear to be the case.

My guess is that we could close this ticket and re-open it unless you can reproduce the original bug.

@nhirschey
Copy link
Collaborator

Thanks for checking @DavidSSL. Before closing we should add a few more tests to make sure.

For future reference to me and others, relevant commonmark test cases are around the below case in https://spec.commonmark.org/0.30/spec.json

  {
    "markdown": "  & © Æ Ď\n¾ ℋ ⅆ\n∲ ≧̸\n",
    "html": "<p>  &amp; © Æ Ď\n¾ ℋ ⅆ\n∲ ≧̸</p>\n",
    "example": 25,
    "start_line": 650,
    "end_line": 658,
    "section": "Entity and numeric character references"
  }

@nhirschey nhirschey self-assigned this Apr 7, 2023
@DavidSSL
Copy link

DavidSSL commented Apr 7, 2023

You're welcome @nhirschey. I was inspired by your Amplifying FSharp talk to contribute back even though I don't use FSharp.Formating :).

Out of curiosity, since I am not so familiar with this domain, for the common mark tests above, the result you'd be expecting is this?

image

@nhirschey
Copy link
Collaborator

nhirschey commented Apr 12, 2023

@DavidSSL that's very kind of you to say! It'd be great to have your help. I believe your example is rendering correctly.

One thing you could try .... there are tests for the commonmark spec in this library but the library does not pass all of them so some of them are disabled. It looks like the "Entity and numeric character references" tests are disabled currently. If you want, perhaps try enabling them by adding them here?

let enabledSections = [ "Fenced code blocks"; "Indented code blocks"; "Paragraphs"; "Precedence"; "Tabs" ]

If dotnet test runs without any failures after you add "Entity and numeric character references" to the enabled sections, then I'd say make a pull request with your change and we'd be good to close this issue.

@DavidSSL
Copy link

DavidSSL commented Apr 14, 2023

@nhirschey, you will have to give me some time to look at this because I'm not so familiar with the domain and the code. Thanks for pointing me in the right direction though. I should be able to figure things out.

Having said that, the following:

"markdown": "&nbsp; &amp; &copy; &AElig; &Dcaron;\n&frac34; &HilbertSpace; &DifferentialD;\n&ClockwiseContourIntegral; &ngE;\n",
"start_line": 4976,
"section": "Entity and numeric character references",
"html": "<p>  &amp; © Æ Ď\n¾ ℋ ⅆ\n∲ ≧̸</p>\n",
"example": 289,
"end_line": 4984

does not look correct at all. I would assume that the HTML should be like what I have in the post above. Correct?

Moreover, it would appear that #464 is actually incorrectly implemented.

[<Test>]
let ``Don't double encode HTML entities outside of code`` () =
"a &gt; & &copy; b"
|> Markdown.ToHtml
|> should contain "<p>a &gt; &amp; &copy; b</p>"

because when I run this via dingus, I get:

image

As you can see, I do need further guidance in terms of expected behaviour.

@DavidSSL
Copy link

@nhirschey I think that I understand the problem space better. However, that is a bigger piece of work than envisaged. Basically, if you use the https://spec.commonmark.org/0.30/spec.json, tests belonging to the Fenced code blocks and Tabs sections also start breaking.

I will certainly give it a go but it could be quite a slog.

@nhirschey
Copy link
Collaborator

Thanks for digging into this @DavidSSL.

For sure it's too much to get this library fully complying with the commonmark spec in one go; that's a huge task. But it would be awesome if you happen to find a bite-size chunk that you can fix to help push us towards that goal.

No pressure, no rush. Even simply through your investigation here I've learned some things, thank you.

Regarding,

 [<Test>] 
 let ``Don't double encode HTML entities outside of code`` () = 
     "a &gt; & &copy; b" 
     |> Markdown.ToHtml 
     |> should contain "<p>a &gt; &amp; &copy; b</p>" 

I agree that it should contain "<p>a &gt; &amp; © b</p>".

Some existing tests account for "improper" markdown parsing, and they will break when the parsing is better. When I wrote the below test for emphasis based off the commonmark spec, I knew the actual correct value should be <p>a*&quot;foo&quot;*</p> but I was focused on fixing emphasis, not quotes.

let ``No emphasis if opening * is preceded by alphanumeric and followed by punctuation`` () =
let doc = """a*"foo"*"""
let actual = """<p>a*"foo"*</p>""" + "\r\n" |> properNewLines
Markdown.ToHtml doc |> shouldEqual actual

@DavidSSL
Copy link

@nhirschey things are clear now. I'll create tickets and link back to this issue to try and move the needle towards compliance. I might not succeed but I'll sure try and give it a go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants