Project

General

Profile

[Solved] Script for fixing spelling errors with codespell

Added by mrumph 6 months ago

The following script fixes spelling errors in 36 files in lighttpd 1.4:

spell-check.sh ================================

#!/bin/sh

#
# This script runs codespell against selected files.
#

CODESPELL_LOC=`which codespell`
if test "${CODESPELL_LOC}" = ""; then
    echo "This script requires codespell which was not found." 
    exit 1
fi

CODESPELL_VER=`codespell --version 2>&1`
echo "The codespell version is ${CODESPELL_VER}." 

WHITE_LIST="alot,catched,comman,crypted,dieing,gir,happend,iff,maintainance,mut" 
WHITE_LIST="${WHITE_LIST},ot,referer,respons,thru,unx,wan't" 

#
# Scan for file-specific actions
#

for FILENAME in `git ls-files`; do
    # skip subdirectories, git ls-files is recursive
    test -d $FILENAME && continue

    case ${FILENAME} in

    *.h|*.c)
        #
        # Run codespell against specific file
        #
        codespell -d -q 3 -w -L ${WHITE_LIST} ${FILENAME}
        ;;
    esac
done

exit 0
==============================================================

The patch created by "git diff" has 576 lines.
The patch would need to be reviewed carefully,
but as far as I can tell, it is clean and the code does "make" successfully.
The tricky part is the white list that prevents collisions with variables, etc.

Mike Rumph


Replies (4)

RE: Script for fixing spelling errors with codespell - Added by gstrauss 6 months ago

lighttpd was originally written by some good programmers who were not native English speakers, so the existence of some misspellings is not surprising.

With the whitelist your provided, codespell finds 30 misspellings, with 6 in the imported lemon.c. However, it does not fix 'catched' in numerous places. [edit: I see that in your whitelist]

I'll review some of the output and may experiment with a modified whitelist.

Thanks.

RE: Script for fixing spelling errors with codespell - Added by mrumph 6 months ago

Thanks for reviewing.
"catched" gets translated to "caught" by codespell instead of "cached".
If I remember correctly, there were instances for both of these interpretations in the code.
The script doesn't fix all spelling errors, just those that can be done safely in an automatic fashion.
The items on the white list might be conflicts with variables or have ambiguous results.
You can run a version of the script without the -w and -L options to see what further spelling errors can be found.
Some of the variables could be renamed if so desired.
For example, the code refers to a file called "maintainance.html".
So I added "maintainance" to the white list, but this stops many places where this could be fixed to "maintenance".

Take care,

Mike Rumph

RE: Script for fixing spelling errors with codespell - Added by mrumph 6 months ago

Also, the "*.h|*.c)" case can be expanded to "*.h|*.c|*.sh|*.txt|NEWS".
Noticed that there are several spelling errors in NEWS by itself.
For the *.sh case, we might want to add "objext" to the white list.

Mike

RE: [Solved] Script for fixing spelling errors with codespell - Added by gstrauss 6 months ago

I reviewed the output and made some changes to correct spelling.

for FILENAME in `git ls-files`; do
    case "${FILENAME}" in
      *.h|*.c|*.txt|*.sh)
        [ -f "${FILENAME}" ] || continue
        codespell -d -q 3 "${FILENAME}" 
        ;;
    esac
done

    (1-4/4)