J-Novel Club
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Users

    Simple epub to xhtml Converter, for Offline Browsers (Requires Linux)

    Other Discussion
    2
    3
    211
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      JonathanTaconator Member last edited by

      I didn't want to bother with other programs to read my ebooks so I made a converter script in Linux to open up and reformat epub books from j-novel, which I can then read on any device. I thought I'd share my code in case anyone is interested, I have no idea if anyone will find it useful though.

      Features:

      • Should be fully automatic with minimal setup on Linux platforms.
      • Flattens the entire book into one browser compatible file.
      • Automatically creates chapter navigation.
      • Embeds a stylesheet for common elements (dark mode). (If you want to modify it, either open full.xhtml in a text editor, modify the script below, or include another stylesheet in the same folder)
      • Basic profanity filter. (Prioritizes resulting readability over removing everything).

      Potential Issues:

      • I have not tested this on any other devices, use at your own risk.
      • I have not tested this on any other series.
      • Note on security? Don't paste random code into your command line unless you are sure it won't hurt you. This script is pretty transparent so it should be easy enough to validate.
      • Unfortunately, at the time of writing, a number of common mobile browsers do not allow loading separate images, though I still prefer reading on my phone and just switching to photo viewer whenever there is an insert.

      Steps:

      • Ensure relevant commands are compatible with your system (I haven't tested on any other devices, I'm just using Ubuntu on Windows Subsystem for Linux). You may need to install unzip using the command below or, alternatively, unzip the epub manually (rename to .zip and extract like usual) into a folder of your choice.
      sudo apt install unzip
      
      • Modify the first two lines of the script such that the appropriate epub file is targeted. The example is for a part number and volume number, e.g. "make 4-3" would output "Processing p4-v3" for the first line. If you are confused, it will be faster to unzip it manually and remove the first two lines.
      • Copy the following script into a Makefile (or replace "$*" with a folder of your choice and paste the commands into your command line).
      • Place the epub file (or extracted folder) into the same directory as your Makefile and run "make <name of folder>". For example, to convert "ascendance-of-a-bookworm-part-3-volume-1.epub" I would run "make 3-1" which would create the folder "3-1" and place the relevant files in there.
      • If everything went well, your folder should now have a bunch of images and full.xhtml, which should be compliant with most browsers.
      %:
      	echo "$*" | sed 's/\([0-9]*\)-\([0-9]*\)/Processing p\1-v\2/'
      	echo "$*" | sed 's/\([0-9]*\)-\([0-9]*\)/ascendance-of-a-bookworm-part-\1-volume-\2.epub/' | xargs unzip -d $* > discard.me
      	grep $*/OEBPS/Text/cover.xhtml -e "utf" -e "DOCTYPE" -e "<html" -e "head" -e "meta" -e "title" -e "link" > sb
      	echo -e "<style>\nbody {\n line-height: 1.2em;\n font-size: 1em;\n overflow-wrap: break-word;\n background-color: #222;\n color: white;\n font-family: Lato, sans-serif;\n font-size: 110%;\n user-select: none;\n cursor: none;\n margin: 0em;\n padding: 1em;\n}\n\nimg {\n max-width: 100%;\n}\n\n.main { \n  font-weight: normal; \n  letter-spacing: 0; \n  orphans: 1; \n  widows: 1; \n  word-spacing: 0; \n}\n\np { display: block;\n margin-top: 0em;\n margin-bottom: 0.5em;\n margin-left: 0em;\n margin-right: 0em;\n text-indent: 18pt;\n }\n\np.signature {\n text-align: right;\n}\n\nblockquote {\nmargin-top: 1em;\nmargin-bottom: 1em;\nmargin-left: 1em;\nmargin-right: 1em;\n}\n\nblockquote p {\nmargin-left: 0;\nmargin-right: 0;\n}\n\nli {\nfont-size: 1em;\nmargin-top: 6pt;\n}\n\nli p {\ntext-indent: 0em;\n}\n\nul {\nmargin-top: 1em;\nmargin-bottom: 1em;\n}\n\nol {\nmargin-top: 1em;\nmargin-bottom: 1em;\ntext-align: left;\n}\n\nh1 {\nfont-size: 1.55em;\nmargin-top: 10em;\nmargin-bottom: 1em;\nline-height: 1.2em;\ntext-indent: 20pt;\n}\n\nh2 {\nfont-size: 1.15em;\nmargin-top: 1.5em;\nmargin-bottom: .5em;\nline-height: 1.2em;\n}\n\ntable\n{\nmargin-top: 1.5em;\nmargin-bottom: 1.5em;\nfont-size: 0.9em;\nborder-collapse: collapse;\n}\n\ntr td\n{\nvertical-align: top;\npadding: 0.2em;\n}\n\ncode {\nfont-family: Consolas,\"courier new\",monospace;\n}\n</style>\n<body>" >> sb
      	grep -v -e "signup.xhtml" -e "toc.xhtml" $*/OEBPS/content.opf | grep -e "<itemref" | sed -z 's/cover"/cover.xhtml"/; s/    <itemref idref="/$*\/OEBPS\/Text\//g; s/"\/>\r*\n/ /g' | xargs cat | grep -e "<section" -e "<div" -e "</section" -e "</div" -e "<h1" -e "<h2" -e "<p" -e "<img" > t.xhtml
      	grep -e '<section' -e '<div' -e 'h1' t.xhtml | sed -z -E 's/<section[^i>]*id="([^<"]*)">\s*<div class="main">\s*<h1>([^<]*)<\/h1>/<li class="toc-front"><a href="\#\1">\2<\/a><\/li>/g' | grep -e 'li class="toc-front"' >> n.xhtml
      	echo "</ol></nav>" >> n.xhtml
      	cat sb n.xhtml t.xhtml | sed 's/"..\/Images\//"/g; s/"..\/Styles\//"/g' | sed 's/the hell//g; s/as hell//g; s/Hell,/In fact,/g; s/The hell/What/g; s/a hell of a/quite a/g; s/hell /grief /g; s/damned //g; s/damning/heinous/g; ' > full.xhtml
      	echo "</body> </html>" >> full.xhtml
      	mv full.xhtml $*/.
      	rm -f sb t.xhtml n.xhtml discard.me
      	mv $*/OEBPS/Images/* $*/.
      	rm -r $*/OEBPS $*/META-INF $*/mimetype
      
      1 Reply Last reply Reply Quote 3
      • J
        JonathanTaconator Member last edited by JonathanTaconator

        Addendum: I tested this on another series! I also fixed a bug, sorry to anyone if you tried this before :\
        The profanity filter has been upgraded and I made it easier to see how to modify the book name so that everything you need to modify is part of those variables and I included two versions if you want to reference how it is done. Here is an updated second step:

        • Modify the first two variables of the script such that the appropriate epub file is targeted. The example is for a part number and volume number, e.g. "make 4-3" would output "Processing p4-v3: ascendance-of-a-bookworm-part-4-volume-3.epub" for the first line. For the folder name, try using a folder name of your choice and replace the volume number (and/or part number) with '$d' and then paste the epub filename to the second variable, replacing the first number with '\1' and subsequent numbers with '\2', '\3', and so on. Be sure the numbers are lined up for order (e.g. for ascendance, part is \1 and volume is \2).
        d=\([0-9]*\)
        # '$d' represents a number, '\1' puts the first one back in, and 'v' or '-' are part of the folder name
        folderNameFormat= $d-$d
        epubNameFormat= ascendance-of-a-bookworm-part-\1-volume-\2.epub/
        folderNameFormat1= v$d
        epubNameFormat1= taking-my-reincarnation-one-step-at-a-time-no-one-told-me-there-would-be-monsters-volume-\1.epub
        
        %:
        	echo "$*" | sed 's/$(folderNameFormat)/Processing $*: $(epubNameFormat)/'
        	echo "$*" | sed 's/$(folderNameFormat)/$(epubNameFormat)/' | xargs unzip -d $* > discard.me
        	grep $*/OEBPS/Text/cover.xhtml -e "utf" -e "DOCTYPE" -e "<html" -e "head" -e "meta" -e "title" -e "link" > sb
        	echo "<style>\nbody {\n line-height: 1.2em;\n font-size: 1em;\n overflow-wrap: break-word;\n background-color: #222;\n color: white;\n font-family: Lato, sans-serif;\n font-size: 110%;\n user-select: none;\n cursor: none;\n margin: 0em;\n padding: 1em;\n}\n\nimg {\n max-width: 100%;\n}\n\n.main { \n  font-weight: normal; \n  letter-spacing: 0; \n  orphans: 1; \n  widows: 1; \n  word-spacing: 0; \n}\n\np { display: block;\n margin-top: 0em;\n margin-bottom: 0.5em;\n margin-left: 0em;\n margin-right: 0em;\n text-indent: 18pt;\n }\n\np.signature {\n text-align: right;\n}\n\nblockquote {\nmargin-top: 1em;\nmargin-bottom: 1em;\nmargin-left: 1em;\nmargin-right: 1em;\n}\n\nblockquote p {\nmargin-left: 0;\nmargin-right: 0;\n}\n\nli {\nfont-size: 1em;\nmargin-top: 6pt;\n}\n\nli p {\ntext-indent: 0em;\n}\n\nul {\nmargin-top: 1em;\nmargin-bottom: 1em;\n}\n\nol {\nmargin-top: 1em;\nmargin-bottom: 1em;\ntext-align: left;\n}\n\nh1 {\nfont-size: 1.55em;\nmargin-top: 10em;\nmargin-bottom: 1em;\nline-height: 1.2em;\ntext-indent: 20pt;\n}\n\nh2 {\nfont-size: 1.15em;\nmargin-top: 1.5em;\nmargin-bottom: .5em;\nline-height: 1.2em;\n}\n\ntable\n{\nmargin-top: 1.5em;\nmargin-bottom: 1.5em;\nfont-size: 0.9em;\nborder-collapse: collapse;\n}\n\ntr td\n{\nvertical-align: top;\npadding: 0.2em;\n}\n\ncode {\nfont-family: Consolas,\"courier new\",monospace;\n}\n</style>\n<body>" >> sb
        	grep -v -e "signup.xhtml" -e "toc.xhtml" $*/OEBPS/content.opf | grep -e "<itemref" | sed -z 's/cover"/cover.xhtml"/; s/    <itemref idref="/$*\/OEBPS\/Text\//g; s/"\/>\r*\n/ /g' | xargs cat | grep -e "<section" -e "<div" -e "</section" -e "</div" -e "<h1" -e "<h2" -e "<p" -e "<img" > t.xhtml
        	echo "<nav><h2>Table of Contents</h2><ol start=\"0\" epub:type=\"list\">" > n.xhtml ; grep -e '<section' -e '<div' -e 'h1' t.xhtml | sed -z -E 's/<section[^i>]*id="([^<"]*)">\s*<div class="main">\s*<h1>([^<]*)<\/h1>/<li class="toc-front"><a href="\#\1">\2<\/a><\/li>/g' | grep -e 'li class="toc-front"' >> n.xhtml ; echo "</ol></nav>" >> n.xhtml
        	cat sb n.xhtml t.xhtml | sed 's/"..\/Images\//"/g; s/"..\/Styles\//"/g' | sed 's/the hell//g; s/Hell if I/No way I’d/g; s/as hell//g; s/Hell,/In fact,/g; s/Hell no/No way/g; s/The hell/What/g; s/Hell ye/Oh ye/g; s/HELL//g; s/a hell of a/quite a/g; s/hell /grief /g; s/like hell/like crazy/g; s/damned //g; s/damning/heinous/g; s/Damn it,/Really,/g; s/Damn it!//g; s/Damn it\.//g; s/Damn this/This/g; s/Damn them!//g; s/Damn!/Oh no!/g; s/Damn him!/How awful!/g; s/Damn you/You/g; s/Damn right/That’s right/g; s/Damn these/These awful/g; s/Damn,/Seriously,/g; s/damn//g; s/Tch\.//g; ' > full.xhtml
        	echo "</body> </html>" >> full.xhtml
        	mv full.xhtml $*/.
        	rm -f sb t.xhtml n.xhtml discard.me
        	mv $*/OEBPS/Images/* $*/.
        	rm -r $*/OEBPS $*/META-INF $*/mimetype
        
        1 Reply Last reply Reply Quote 0
        • G
          groxx Premium Member last edited by

          At the much fancier end of the spectrum, for anyone who uses Calibre (which includes me, begrudgingly - it's very far from my favorite application): https://manual.calibre-ebook.com/generated/en/ebook-convert.html

          It includes a command-line tool to convert epubs to HTML. I suspect it's what its web-reader uses.

          If you don't want to go through its custom install process (many Linux distro packages won't install correctly): flatpaks work pretty well for it, and isolate all its dependencies. It's nearly an ideal example app for why flatpaks are useful, though it is a pretty large download. https://askubuntu.com/questions/1352523/how-to-run-or-execute-the-ebook-convert-script-from-a-flatpak-install-of-calibre

          1 Reply Last reply Reply Quote 0
          • 1 / 1
          • First post
            Last post