avatar
Shriram Krishnamurthi @shriram.bsky.social

I wrote this in my PL textbook in 2003 (hence XML) but and still feel this was a missed opportunity. Semi-structured w/ tags would also make your code more robust to variations in Unix platforms. I disagree with the article's call for "datatypes"; down that road comes mutual incomprehension. ↵

image
sep 2, 2025, 12:06 pm • 2 0

Replies

avatar
Shriram Krishnamurthi @shriram.bsky.social

Perlis talks about this in the Foreword to SICP, but I think we could still do a bit better — especially when the "stream of bytes" view is *still available* (just don't ask for the semi-structured output).

image
sep 2, 2025, 12:09 pm • 0 0 • view
avatar
Jack Rusher @jackrusher.com

I like the idea of self-describing metadata (in this case as little state machines) more than hacking in a few predefined formats. 9p.io/sources/cont...

sep 2, 2025, 1:12 pm • 1 0 • view
avatar
Shriram Krishnamurthi @shriram.bsky.social

That's interesting, thanks. Certainly wouldn't mind if data had their own shebang to match that of control. (Racket's `#lang` is basically a modern variant of shebang that applies just as well to data as to control.) ↵

sep 2, 2025, 8:46 pm • 3 0 • view
avatar
Shriram Krishnamurthi @shriram.bsky.social

I'm not sold on regular being the way to do it, both since people don't understand their computational limits (StackOverflow: "why doesn't my regexp parse HTML?!?") and because they're hard to write correctly and to read (we actually have a study on this right now). ↵

sep 2, 2025, 8:47 pm • 3 0 • view
avatar
Shriram Krishnamurthi @shriram.bsky.social

Trees have proven to be a really good point in the Chomsky hierarchy: almost all of the nice properties of regular, you get context-sensitive for not much work, you can have a nice bicameral syntax, etc., etc.

sep 2, 2025, 8:48 pm • 1 0 • view
avatar
pnwrainorshine.bsky.social @pnwrainorshine.bsky.social

I'm always impressed by what awk can do. I wonder what percentage of people use it these days. The very minimal time I've spent with powershell I've always left shaking my head thinking awk could have been much more effective with semi-structured text.

sep 2, 2025, 6:42 pm • 1 0 • view
avatar
Shriram Krishnamurthi @shriram.bsky.social

I think awk is mostly only used by the 40+ set. It really is great for line-at-a-time. But so often I have to do things that at least slightly cross a line. Imagine an awk that was designed for json, say.

sep 2, 2025, 8:53 pm • 1 0 • view
avatar
pnwrainorshine.bsky.social @pnwrainorshine.bsky.social

On a case by case basis the data can be made to work better with awk, often boiling down to a few things (including writing emacs keyboard macros or functions). One is that jq -c ...will put a sequence of json objects each on one line.

sep 2, 2025, 9:38 pm • 1 0 • view
avatar
pnwrainorshine.bsky.social @pnwrainorshine.bsky.social

But yeah, I don't use awk a lot anymore for json. Before json tools were very good I would do a simplistic conversion to xml and use xpath and then work from there. These days I often load json into postgresql and use that excellent toolset. Even that's probably only for the over 40's!

sep 2, 2025, 9:44 pm • 1 0 • view
avatar
Shriram Krishnamurthi @shriram.bsky.social

It really frustrates me that I don't know a command pipeline remotely as well as I do the Unix shell for, say, json. I mean, it's fine, I fire up Racket and code away, but there's got to be a better way. I should probably learn jq and friends and quit my bellyaching.

sep 3, 2025, 12:05 am • 3 0 • view
avatar
Shriram Krishnamurthi @shriram.bsky.social

I'm careful to say "I don't know" as opposed to the proper Internet way of speaking, which is to say "there doesn't exist". (-:

sep 3, 2025, 12:05 am • 2 0 • view
avatar
Jack Rusher @jackrusher.com

I still use sed and awk on the command line, but with JSON I just end up live coding everything with an editor connected Lisp REPL 🤷🏻‍♂️

sep 3, 2025, 9:08 am • 3 0 • view
avatar
James Brundage | MVP @mrpowershell.com

This opportunity was not missed, it was picked up by #PowerShell :-) Get-ChildItem -Filter *.svg | Select-Xml -XPath //* | Where-Object { $_.Node.Name -eq 'path' } | Foreach-Object { $_.Node.D } _Everything_ is an object in PowerShell, and the object pipeline is overpowered.

sep 2, 2025, 8:45 pm • 1 0 • view