Skip to content

Escape all literal dollar signs#3

Merged
angelhof merged 13 commits into
binpash:mainfrom
BolunThompson:escape_dollar
Dec 2, 2024
Merged

Escape all literal dollar signs#3
angelhof merged 13 commits into
binpash:mainfrom
BolunThompson:escape_dollar

Conversation

@BolunThompson

@BolunThompson BolunThompson commented Nov 22, 2024

Copy link
Copy Markdown
Contributor

In principle, should resolve binpash/pash#664. If c is a literal dollar sign, it should be parsed as one (with no need to account for escaping & variable interpolation). Yet, this fix is suspiciously simple — is there an edge case here?

The tests for PaSh still pass with this change.

Signed-off-by: Bolun Thompson <bolunthompson@ucla.edu>
@angelhof

Copy link
Copy Markdown
Member

Wow, that is indeed suspiciously simple, lol. Does it work for this problematic case (binpash/pash#664 (comment))? Also, as an experiment, what does the parsed AST of this example look like?

I am wondering why we need to diverge from the original libdash pretty printer though (https://github.com/binpash/libdash/blob/master/libdash/printer.py#L412). That printer seems to pass their roundtrip tests (https://github.com/binpash/libdash/tree/master/test). Could we somehow manage to run the roundtrip libdash tests here too? Is this issue exposed anywhere in the libdash tests? If not, should we add a test that exposes it?

@mgree thoughts on this? Am I missing something?

@angelhof

Copy link
Copy Markdown
Member

One more thought: I guess PaSh's usecase is special (and therefore maybe easier to handle) because when we call the pretty printer it is after the compiler is done and therefore everything has been expanded already, so any spare $ must be escaped?

@mgree

mgree commented Nov 22, 2024

Copy link
Copy Markdown

No, this looks like the correct printer fix. The libdash printer treats this correctly:

>>> import libdash
>>> cmds = list(libdash.parse('f'))
>>> cmds
[(['Command', [1, [], [[['C', 101], ['C', 99], ['C', 104], ['C', 111]], [['Q', [['E', 36], ['C', 120]]]]], []]], 'echo "\\$x"\n', 0, 1)]
>>> libdash.to_string(cmds[0][0])
'echo "\\$x"'

I don't know how the shasta printer diverged from the libdash one...

@angelhof

angelhof commented Nov 23, 2024

Copy link
Copy Markdown
Member

I don't think the two printers diverge. I am seeing different behavior @mgree (after making libdash from source):

$ cat ../../../../../f
echo '$1'
$ python3
Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libdash
>>> cmds = list(libdash.parse('f'))
>>> cmds
[(['Command', [1, [], [[['C', 101], ['C', 99], ['C', 104], ['C', 111]], [['Q', [['C', 36], ['C', 49]]]]], []]], "echo '$1'\n", 0, 1)]
>>> libdash.to_string(cmds[0][0])
'echo "$1"'
>>>

Maybe you had f be echo "\$x"? Like the one below?

$ cat f2 
echo "\$x"
$ python3
Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libdash
>>> cmds = list(libdash.parse('f2'))
>>> cmds
[(['Command', [1, [], [[['C', 101], ['C', 99], ['C', 104], ['C', 111]], [['Q', [['E', 36], ['C', 120]]]]], []]], 'echo "\\$x"\n', 0, 1)]
>>> libdash.to_string(cmds[0][0])
'echo "\\$x"'
>>>

The issue here seems to be with the single quote, and the fact that $ in the single quote is added as C, even if followed by something. However, the fix that @BolunThompson proposed seems to lead to an issue with the following case (where $ is standalone):

$ cat f4
echo $ a
## This is with the fix!
$ python3
Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libdash
>>> cmds = list(libdash.parse('f4'))
>>> cmds
[(['Command', [1, [], [[['C', 101], ['C', 99], ['C', 104], ['C', 111]], [['C', 36]], [['C', 97]]], []]], 'echo $ a\n', 0, 1)]
>>> libdash.to_string(cmds[0][0])
'echo \\$ a'

Having the fix, it now always escapes the character $, but if we have a word that only contains $, this is not correct.

Maybe the correct fix is to only escape the (C, $) on the printer if it is followed by something that could be interpreted as a variable by the shell (not only escaped characters as it is currently implemented, but any character). Is there any followup character to $ in which case we don't want to escape $ @mgree ?

A larger issue here IMO is that we have two separate printer implementations. One in libdash and one in shasta (without any tests). My suggestion would be to:
(1) remove libdash's printer and import shasta's printer if we want printer functionality in libdash (we could also remove printing in shasta's printer, but it feels like a better place to have the printer).
(2) add tests in shasta (both locally and in CI), that import libdash, and run libdash's tests using shasta's printer.
(3) add a few tests with corner cases with $ (at least the ones I wrote in this comment) to make sure we won't regress in the future. Maybe there are some failing libdash tests now that fail due to this?

Also, it seems that libdash's tests fail to catch this issue, because they do two roundtrips (and in the second and third this is interpreted as a variable):

$ cat ../test/failing/single-quote.sh
echo '$a'
echo $ a
$ ../test/round_trip.sh ./rt.py ../test/failing/single-quote.sh
PASS '../test/failing/single-quote.sh' (two runs to fixpoint)
## this is the output of `echo "$orig" "$rt" "$rtrt"`
/tmp/tmp.rgHMVGpsHK /tmp/tmp.y6EkOh0NEJ /tmp/tmp.6cK85M8VJb
$ cat /tmp/tmp.rgHMVGpsHK /tmp/tmp.y6EkOh0NEJ /tmp/tmp.6cK85M8VJb
echo "$a"
echo $ a

echo "${a}"
echo $ a

echo "${a}"
echo $ a

@mgree

mgree commented Nov 25, 2024

Copy link
Copy Markdown

Huh. My f was different, writing echo "\$x". I agree that C '$' should be escaped.

@angelhof

Copy link
Copy Markdown
Member

So I think the plan of action (to complete this PR) is the following. @BolunThompson let me know if you have any questions:

  1. Add tests in shasta that import and build the latest libdash and run its tests with the shasta printer.
  2. Add one new test (ideally in libdash too so that we have all tests at the same place):
echo '$1'
echo $ a

This test should not be allowed to run two roundtrips to pass, because the bug will be missed.
3. Do the fix that only escapes $ if it is followed by another character (it is possible down the line that we identify an even finer-grained behavior here, but no need to fix it prematurely).
4. Later, and in a separate PR, we can consider removing libdash's python printer completely and just use shasta's printer. There is no need to keep two printer implementations around.

@mgree

mgree commented Nov 28, 2024

Copy link
Copy Markdown

That plan sounds good to me!

@BolunThompson

Copy link
Copy Markdown
Contributor Author

I added a github action to run the new shasta tests on PR, mostly copied from PaSh. Someone who has experience with actions should review it before merging — I’ve never written one before.

@angelhof angelhof left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @BolunThompson, that is great! The workflow looks fine except that we are not keeping track of the exit code of the tests now. Other than this, I was thinking whether it makes sense to only import libdash for tests (through pip or github, or with github first and pip as a backup, or vice versa) instead of having it as a submodule. What do people think?

Comment thread .github/workflows/test.yml Outdated
Comment thread .gitmodules Outdated
Comment thread .github/workflows/test.yml Outdated
Prevents calling test setup every time the tests are run.
@angelhof angelhof merged commit a85f998 into binpash:main Dec 2, 2024
@angelhof angelhof mentioned this pull request Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

awk argument issues

3 participants