Post by Shriram Krishnamurthi / Redsky

I wrote this in my PL textbook in 2003 (hence XML) but and still feel this was a missed opportunity. Semi-structured w/ tags would also make your code more robust to variations in Unix platforms. I disagree with the article's call for "datatypes"; down that road comes mutual incomprehension. ↵

sep 2, 2025, 12:06 pm • 2 0

Replies

Perlis talks about this in the Foreword to SICP, but I think we could still do a bit better — especially when the "stream of bytes" view is *still available* (just don't ask for the semi-structured output).

sep 2, 2025, 12:09 pm • 0 0 • view

I like the idea of self-describing metadata (in this case as little state machines) more than hacking in a few predefined formats. 9p.io/sources/cont...

sep 2, 2025, 1:12 pm • 1 0 • view

That's interesting, thanks. Certainly wouldn't mind if data had their own shebang to match that of control. (Racket's `#lang` is basically a modern variant of shebang that applies just as well to data as to control.) ↵

sep 2, 2025, 8:46 pm • 3 0 • view

I'm not sold on regular being the way to do it, both since people don't understand their computational limits (StackOverflow: "why doesn't my regexp parse HTML?!?") and because they're hard to write correctly and to read (we actually have a study on this right now). ↵

sep 2, 2025, 8:47 pm • 3 0 • view

Trees have proven to be a really good point in the Chomsky hierarchy: almost all of the nice properties of regular, you get context-sensitive for not much work, you can have a nice bicameral syntax, etc., etc.

sep 2, 2025, 8:48 pm • 1 0 • view

I'm always impressed by what awk can do. I wonder what percentage of people use it these days. The very minimal time I've spent with powershell I've always left shaking my head thinking awk could have been much more effective with semi-structured text.

sep 2, 2025, 6:42 pm • 1 0 • view

I think awk is mostly only used by the 40+ set. It really is great for line-at-a-time. But so often I have to do things that at least slightly cross a line. Imagine an awk that was designed for json, say.

sep 2, 2025, 8:53 pm • 1 0 • view

On a case by case basis the data can be made to work better with awk, often boiling down to a few things (including writing emacs keyboard macros or functions). One is that jq -c ...will put a sequence of json objects each on one line.

sep 2, 2025, 9:38 pm • 1 0 • view

But yeah, I don't use awk a lot anymore for json. Before json tools were very good I would do a simplistic conversion to xml and use xpath and then work from there. These days I often load json into postgresql and use that excellent toolset. Even that's probably only for the over 40's!

sep 2, 2025, 9:44 pm • 1 0 • view

It really frustrates me that I don't know a command pipeline remotely as well as I do the Unix shell for, say, json. I mean, it's fine, I fire up Racket and code away, but there's got to be a better way. I should probably learn jq and friends and quit my bellyaching.

sep 3, 2025, 12:05 am • 3 0 • view

I'm careful to say "I don't know" as opposed to the proper Internet way of speaking, which is to say "there doesn't exist". (-:

sep 3, 2025, 12:05 am • 2 0 • view

I still use sed and awk on the command line, but with JSON I just end up live coding everything with an editor connected Lisp REPL 🤷🏻‍♂️

sep 3, 2025, 9:08 am • 3 0 • view

This opportunity was not missed, it was picked up by #PowerShell :-) Get-ChildItem -Filter *.svg | Select-Xml -XPath //* | Where-Object { $_.Node.Name -eq 'path' } | Foreach-Object { $_.Node.D } _Everything_ is an object in PowerShell, and the object pipeline is overpowered.

sep 2, 2025, 8:45 pm • 1 0 • view