Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html <br> not suport. #11

Closed
kingreatwill opened this issue Mar 28, 2020 · 7 comments
Closed

html <br> not suport. #11

kingreatwill opened this issue Mar 28, 2020 · 7 comments

Comments

@kingreatwill
Copy link

kingreatwill commented Mar 28, 2020

var html =`
<p>1. xxx <br/>2. xxxx<br/>3. xxx</p><p><span class="img-wrap"><img src="xxx"></span><br>4. golang<br>a. xx<br>b. xx</p>
`

func Test_md(t *testing.T) {
	var converter = md.NewConverter("", true, nil)
	md_str,_ := converter.ConvertString(html)
	println(md_str)
}

output

1\. xxx 2\. xxxx3\. xxx

![](xxx)4\. golanga. xxb. xx

want

1. xxx 
2. xxxx
3. xxx

![](xxx)
4. golang
a. xx
b. xx
@JohannesKaufmann
Copy link
Owner

@kingreatwill yeah true, thats still missing. Thanks for reporting. I will add that on the weekend.


For now, you can add a new rule for yourself. See examples/add_rule.

Something like that (although I haven't tested it yet):

	newline := md.Rule{
		Filter: []string{"br"}, // register <br>
		Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
			return md.String("\n\n") // return markdown
		},
	}

	conv := md.NewConverter("", true, nil)
	conv.AddRules(newline)

@kingreatwill
Copy link
Author

Thank you! " ." How to deal with it.

@JohannesKaufmann
Copy link
Owner

The escape character ("\") is done because otherwise there would be many side effects from characters that are falsely interpreted as markdown. See issue 7. Thats why the library is escaping any "markdown like" content inside the original HTML content so that final markdown is valid.

But I'm thinking about providing an option to disable that as many don't like it 🤷‍♂️


But the escaping does no harm to how its rendered. It basically just disables the markdown:

(with escaping)

1\. xxx
2\. xxxx
3\. xxx

![](xxx)
4\. golang
a. xx
b. xx

1. xxx
2. xxxx
3. xxx


4. golang
a. xx
b. xx


(without escaping)

1. xxx
2. xxxx
3. xxx

![](xxx)
4. golang
a. xx
b. xx
  1. xxx
  2. xxxx
  3. xxx


4. golang
a. xx
b. xx

@kingreatwill
Copy link
Author

kingreatwill commented Mar 28, 2020

var html =jmap –histo[:live]
out

jmap –histo\$&:live\$&

want

jmap –histo[:live]

How to keep it as it is?

@kingreatwill
Copy link
Author

Thank you!

@JohannesKaufmann
Copy link
Owner

I just merged the changes. You should be able to update the library with go get -u github.com/JohannesKaufmann/html-to-markdown

  1. <br> should now be parsed without any extra rules
  2. the escaping for links ("[" and "]") was disabled and a test case was added

@kingreatwill please let me know if you find any other HTML snippets that don't output the correct result!

@kingreatwill
Copy link
Author

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants