WIP: formatting publications on personal website.

2026-03-16 03:19:55 +01:00 · 2022-09-05 18:48:11 +02:00
parent 7a763f36d2
commit 8a3ad22d99
27 changed files with 47759 additions and 13499 deletions
--- a/content/posts/2022/hugo-short-codes.md
+++ b/content/posts/2022/hugo-short-codes.md
@@ -1,9 +1,9 @@
 ---
 title: "Hugo Short Codes"
 date: 2022-06-14T19:36:18+02:00
-draft: false
+draft: true
 toc: false
-tags: 
+tags:
  - hugo
  - code
 ---
@@ -73,4 +73,3 @@ railroad.Diagram("foo", railroad.Choice(0, "bar", "baz"), css=style)
 {{< python-svg dest="/images/posts/test.svg" title="This is a python-svg exmaple." >}}
 railroad.Diagram("foo", railroad.Choice(0, "bar", "baz"), css=style)
 {{< /python-svg >}}
-
--- a/content/posts/2022/latex-to-markdown.md
+++ b/content/posts/2022/latex-to-markdown.md
@@ -13,9 +13,9 @@ tags:
 Recently I started porting some of my latex articles to markdown as they would
 make a fine contribution to this website in simpler format. Making a simple
 parser python isn't that bad and I could have used [Pandoc](https://pandoc.org/index.html)
-but I wanted a particular format for rendering a hugo markdown page. So I
-prepared several regex-based functions in python to dereference and construct
-a hugo-compatible markdown file.
+but I wanted to keep formatting as simple as possible when rendering a hugo
+markdown page. So I prepared several regex-based functions in python to
+dereference and construct a hugo-compatible markdown file.

 ``` python3
 class LatexFile:
@@ -39,16 +39,18 @@ class LatexFile:
 ```

 The general process for converting a Latex document is outlined above. The
-principle here is to create a flat text source which we then incrementally
-format such that Latex components are translated correctly.
+principle here is to process a flat text source which we then incrementally
+format such that Latex components are translated incrementally and replaced
+by plain text with markdown syntax.


 ## Latex Components

 In order to structure the python code I created several named-tuples for
-self-contained Latex contexts such as figures, tables, equations, etc. then
-by adding a `markdown` property we can replace these sections with hugo
-friendly syntax using short-codes where appropriate.
+self-contained Latex contexts such as figures, tables, equations, etc. Then
+by adding a `markdown` property we can create a collection of objects
+where we can simple replace the corresponding latex code in a predictable
+manner.

 ``` python3
 class Figure(NamedTuple):
@@ -68,8 +70,85 @@ class Figure(NamedTuple):
            fig_str += "{{" + f'< figure src="{file}" width="500" >' + "}}\n"
        fig_str += (
            "{{"
-            + f'< figure src="{self.files[-1] if self.files else ""}" title="Figure {self.index}: {self.caption}" width="500" >'
+            + f'< figure src="{self.files[-1] if self.files else ""}" '
+            + f'title="Figure {self.index}: {self.caption}" width="500" >'
            + "}}\n"
        )
        return fig_str
 ```
+
+Notice that here we use a hugo short-code for when representing the figure in
+markdown. This lets us set with and other properties in a simpler and more
+systematic way.
+
+## Replacement Procedure
+
+As mentioned before the replacement simply looks for sections in the source and
+directly replaces them with appropriate markdown text. In order to do this it
+is important to process the source code in reverse order such that the text
+location references remain correct as the replacement occurs.
+
+``` python3
+def replace_figures(self) -> None:
+    """Dereference and replace all figures with markdown formatting."""
+    fig_list = self.figures
+    fig_list.reverse()
+    for figure in fig_list:
+        self.tex_src = (
+            self.tex_src[: figure.span[0]]
+            + figure.markdown
+            + self.tex_src[figure.span[1] :]
+        )
+    for figure in fig_list:
+        self.tex_src = re.sub(
+            "\\\\ref\{" + figure.label + "\}",
+            str(figure.index),
+            self.tex_src,
+        )
+```
+
+Secondly we also replace the latex references with plain text references. This
+means that instead of using labels that are translated during compilation into
+numbers we directly reference the figure number.
+
+``` python3
+@property
+def figures(self) -> List[Figure]:
+    """Parse TEX contents for context eces."""
+    return [
+        Figure(
+            span=(begin.start(), stop.end()),
+            index=index + 1,
+            files=[
+                elem[1]
+                for elem in re.findall(
+                    "\\\\includegraphics(.*)\{(.*)\}",
+                    self.tex_src[begin.start() : stop.end()],
+                )
+            ],
+            caption=self.first(
+                re.findall(
+                    "\\\\caption\{(.*)\}",
+                    self.tex_src[begin.start() : stop.end()],
+                )
+            ),
+            label=self.first(
+                re.findall(
+                    "\\\\label\{(.*)\}",
+                    self.tex_src[begin.start() : stop.end()],
+                )
+            ),
+        )
+        for index, (begin, stop) in enumerate(
+            zip(
+                re.finditer("\\\\begin\{figure\*?\}", self.tex_src),
+                re.finditer("\\\\end\{figure\*?\}", self.tex_src),
+            )
+        )
+    ]
+```
+
+The piece of python code above exemplifies how we capture all figures found in
+the latex source code and aggregate them in a list of named-tuples. Naturally
+this is dependent on the style used when writing latex but I generally try
+to keep latex-code a simple and systematic as possible.