Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add baseline for old projects #2543

Open
vv12131415 opened this issue Jun 28, 2019 · 10 comments
Open

Add baseline for old projects #2543

vv12131415 opened this issue Jun 28, 2019 · 10 comments

Comments

@vv12131415
Copy link
Contributor

I've found that it's hard introduce PHP_CodeSniffer to old projects because it finds a lot of issues that can't be fixed at once.
Currently, as a workaround I use something like this (the code is console command from laravel app)

CodeSnifferCommand.php

<?php

declare(strict_types=1);

namespace App\Console\Commands;

use Illuminate\Console\Command;
use App\Console\Traits\GitCommandHelper;
use Symfony\Component\Process\Process;

class CodeSnifferCommand extends Command
{
    use GitCommandHelper;

    /** @var string  */
    protected $signature = 'common:phpcs
                            {file? : file to sniff}
                            {--added : sniff all files that have added to the GIT-index}
                            {--untracked : sniff all GIT-untracked files}
                            {--all : sniff all changed files}
                            {--branch=origin/dev : sniff all filed that different between current branch and \'branch-name\'}
                            {--changed : sniff all GIT-modified files that have not yet been added to the index}';

    /** @var string  */
    protected $description = 'Code Sniffer';

    public function handle(): void
    {
        $command = sprintf('vendor/bin/phpcs %s', $this->getPathsToChangedFiles());
        $process = Process::fromShellCommandline($command, base_path());
        $process->run();
        $this->warn('put declare(strict_types = 1) by yourself please');
        $this->line(
            $process->getOutput()
        );
        $this->warn('put declare(strict_types = 1) by yourself please');
    }
}

CodeBeautifierCommand.php

<?php

declare(strict_types=1);

namespace App\Console\Commands;

use Illuminate\Console\Command;
use App\Console\Traits\GitCommandHelper;
use Symfony\Component\Process\Process;

class CodeBeautifierCommand extends Command
{
    use GitCommandHelper;

    /** @var string  */
    protected $signature = 'common:phpcbf
                            {file? : file to beautify}
                            {--added : sniff all files that have added to the GIT-index}
                            {--untracked : sniff all GIT-untracked files}
                            {--all : sniff all changed files}
                            {--branch=dev : sniff all filed that different between current branch and \'branch-name\'}
                            {--changed : sniff all GIT-modified files that have not yet been added to the index}';

    /** @var string  */
    protected $description = 'Code Beautifier';

    public function handle(): void
    {
        $command = sprintf('vendor/bin/phpcbf %s', $this->getPathsToChangedFiles());
        $process = Process::fromShellCommandline($command, base_path());
        $process->run();
        $this->line(
            $process->getOutput()
        );
    }
}

GitCommandHelper.php

<?php

declare(strict_types=1);

namespace App\Console\Traits;

use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Process\Process;
use LogicException;

/**
 * @property InputInterface $input
 */
trait GitCommandHelper
{
    protected function getPathsToChangedFiles(): string
    {
        $files = '';

        if ($this->option('all')) {
            $this->input->setOption('changed', true);
            $this->input->setOption('added', true);
            $this->input->setOption('untracked', true);
            $this->input->setOption('branch', true);
        }

        if ($this->argument('file')) {
            $files .= $this->argument('file') . ' ';
        }

        if ($this->option('changed')) {
            $changedFilesCommand = 'git diff --name-only';
            $changedFilesProcess = Process::fromShellCommandline($changedFilesCommand, base_path());
            $changedFilesProcess->run();

            $files .= $changedFilesProcess->getOutput();
        }

        if ($this->option('added')) {
            $addedCommand = 'git diff --name-only --cached';
            $addedProcess = Process::fromShellCommandline($addedCommand, base_path());
            $addedProcess->run();

            $files .= $addedProcess->getOutput();
        }

        if ($this->option('untracked')) {
            $untrackedCommand = 'git ls-files --others --exclude-standard';
            $untrackedProcess = Process::fromShellCommandline($untrackedCommand, base_path());
            $untrackedProcess->run();

            $files .= $untrackedProcess->getOutput();
        }

        if ($this->option('branch')) {
            $branchDiffCommand = 'git diff --name-only --diff-filter=d ' . $this->option('branch') . ' | grep \.php$ | grep -v \.blade.php$ ';
            $branchDiffProcess = Process::fromShellCommandline($branchDiffCommand, base_path());
            $branchDiffProcess->run();

            $files .= $branchDiffProcess->getOutput();
        }

        $filesArray = array_unique(explode(' ', trim(preg_replace("/[\n\r]/", ' ', $files))));
        $preparedResult = implode(' ', $filesArray);
        if ($preparedResult === '') {
            throw new LogicException('Changed not found');
        }

        return $preparedResult;
    }
}

and the doc's says something like this

- run `git fetch && php artisan common:phpcs --branch=origin/dev` to only show changes compared to `dev`, make sure it's `origin/dev` not just `dev`

But the approach that I've shown above is not fully correct, since I can't change all the methods in the file and this way I need to manually find issues that where introduced in current branch/feature.
Also the baseline will be useful for CI

as examples of baselines we can look at psalm and also I've opened similar issue at DaveLiddament/sarb#26

@jrfnl
Copy link
Contributor

jrfnl commented Jun 28, 2019

Related to #2094 and #2137 ?

@vv12131415
Copy link
Contributor Author

@jrfnl kinda, related to #2137 (from my code you can also add GitUntracked filter)
also looks like more flexible way for #2094, because the threshold is kinda set dynamically, also I will only see errors that where introduced.

The with baseline feature I will not only see the numbers of errors, but also filter out the old one.
This can also be used something like this (which for me is kinda sophisticated right now)

  1. create new feature branch
  2. create (git ignored) file that will contain all of the errors.
  3. build your thing
  4. run CodeSniffer and make somekind of smart (because I don't know right now how do you handle changed files where file lines shifted) diff with the file from 2
  5. see the result

there are 2 problems with the thing I've described above

  1. it's looks sophisticated
  2. you can't use phpcbf after this

@Ma-ve
Copy link

Ma-ve commented Jul 29, 2019

+1 for implementing a baseline feature. Looking at Psalm's implementation, that seems to a great way to gradually improve an existing code base.

@vv12131415
Copy link
Contributor Author

by the way - it is now implemented here https://github.com/DaveLiddament/sarb

you can now give a link to it

@timoschinkel
Copy link
Contributor

We have the desire to move our codebase from PSR-2 to PSR-12, but we are getting a fair amount of errors, mostly on the header structure. I'm not so keen on making a PR that changes a lot of files at once, as it interferes with the Git history a bit too much to my taste. So I'm very interested in a baseline feature. This would allow gradual adoption of a (new) code style without blocking existing work on our codebase.

I have been playing around a little bit to see if this is feasible within the current CodeSniffer codebase and I've come to a few realisations/conclusions; I think the typical location for this would be in the reporting part of the application, because that's were you will have all the information needed. But because the check if an issue is fixable and the subsequent fix is executed the moment \PHP_CodeSniffer\Files\File::addMessage() is called I think that is the only location to inject this functionality without extensive rewriting. At that point in the application flow there is no knowledge (yet) about the amount of errors and warnings per file and so the approach Psalm uses - indicating the amount of errors of a specific type per file - is not feasible. This would mean that a baseline file would need to have an indication about file, line number and type of message. As files change this might give false positives. That is not an issue in my opinion as I expect you to fix these kind of issues when you touch a file anyway.

The approach described above would make the baseline be taken into account when running phpcbf as well, as this is triggered from the same location in the code.

I have made a very simple proof of concept of how this could be implemented: https://github.com/timoschinkel/PHP_CodeSniffer/tree/poc/baseline

What I did was add a baseline property to the configuration. This property should always contain and instance of \PHP_CodeSniffer\Baseline\CheckerInterface and in this PoC two flavours of this interface exist: \PHP_CodeSniffer\Baseline\DisabledChecker that will always mark any message as not being ignored and \PHP_CodeSniffer\Baseline\DefinitionChecker that will read a baseline file and checks if the message should be ignored according to the baseline definition or not. This would make a --ignore-baseline parameter very simple; if enable the DisabledChecker. Ideally the Config object could be equipped with an actual getter to ensure a bit more type safety, but that is probably out of scope for this.

A few side notes to the PoC; I have tried to keep it PHP 5, but I probably still used a few PHP 7 features and I did not bother to follow the CodeSniffer codestyle. The names of properties and classes are also completely open for discussion. I did attempt to extract the functionality into a separate entity as much as possible in an attempt to centralise the functionality and make it easily reusable if a refactor/rewrite of the codebase should occur in the future. By using an interface there is no need to add more conditionals to the method.

My goal is to get a bit of a discussion going about this feature and whether this might be a viable approach. If so I am willing to free up some time to try and implement this.

NB I have looked into creating a filter, but from the code I got the impression that the filters live on a file level, not on an issue level.

@gsherwood
Copy link
Member

check if an issue is fixable and the subsequent fix is executed the moment \PHP_CodeSniffer\Files\File::addMessage() is called

That's not actually how it works. PHPCBF works using the reporting system. When you run PHPCBF, the PHPCBF report class is called to report on the file that was just checked. All fixing kicks off from there: https://github.com/squizlabs/PHP_CodeSniffer/blob/master/src/Reports/Cbf.php

At this point, the file has been checked and all errors and warnings are known. Ones that are to be ignored could be excluded at this point, although I don't know if the error should be excluded while adding it or after the file has been processed.

That's really not the issue for me though. The real issue is that for a feature like this to go in, it needs to be accurate. Simply matching error codes to line numbers is not going to do it. A single new line of code near the top of the file would throw out the entire file baseline and show all the file's error messages again. Similarly, you can't attach errors to tokens as the token numbers will change with even the slightest modification.

The bit to get right for a feature like this is detecting the pattern of the errors, so you can tell that a block of errors is the same as in the baseline, even if the line and token numbers have changed.

You could attempt a diff of the old and new file contents, but that would require VCS tools to be available (to big of an assumption), or for the old file content to be cached by PHPCS (too much data).

I think a better solution is to look at the relative token difference between error messages to determine if a block is still valid.

So if you find a series of errors like this:

Token 12 on line 7 T_FUNCTION : Multi-line function call not indented correctly; expected 8 spaces but found 12
Token 42 on line 16 T_COMMA : Only one argument is allowed per line in a multi-line function call

But on the next pass you see this:

Token 14 on line 7 T_OPEN_PARENTHESIS : Expected 0 spaces before opening parenthesis; 1 found
Token 27 on line 8 T_FUNCTION: Multi-line function call not indented correctly; expected 8 spaces but found 12
Token 57 on line 17 T_COMMA : Only one argument is allowed per line in a multi-line function call

You can tell that those second 2 errors are the same as the ones in the baseline, and can be ignored.

This kind of checking could only be done after the entire file has been processed, so that's when I think a baseline comparison should be kicking in.

@timoschinkel
Copy link
Contributor

That's not actually how it works.

A false assumption from my end, apologies. And thank you for adding some clarification so fast.

I will dive into the code some more and try to get a better understanding of it.

@iquito
Copy link

iquito commented Aug 25, 2020

Maybe the baseline functionality in Psalm and PHPStan can be an inspiration on how to implement it in PHPCS, and having a similar behavior as those two libraries could also be good because many people will use PHPCS with either Psalm and/or PHPStan.

@tdtm
Copy link

tdtm commented Mar 2, 2021

Maybe the baseline functionality in Psalm and PHPStan can be an inspiration on how to implement it in PHPCS

If I may add on that note: Psalm's is much better than PhpStan. With psalm, --update-baseline can easily be ran to remove anything that's been fixed from the baseline. You can use that even when there are still errors. It's an effective way to make sure no errors are added to the baseline; only new ones are removed.

With Phpstan, you can't just do a "remove only" baseline update. You can only generate a brand new baseline, and I found diffing to prevent additions to the baseline to be much harder.

@frankdekker
Copy link

frankdekker commented Jul 19, 2021

In reference to #3387

Problem:

Find a way to uniquely identify where a violation originated from.

Options:

  • Using line numbers is a no go as modifications to the start of the file will mess that up.
  • Using VCS to diff requires dependency with a vcs system.

Proposal:

  • Take the 1 or 2 lines of code (or the tokens) around the violation that's going to be baselined.
  • Normalize all whitespaces.
  • Calculate a hash of these lines of code/tokens.
  • Add that hash to the baseline file as signature.

When comparing against the baseline, calculate the same signature when a violation occurs and see if it matches.

Pro:

  • Changes at the start of the file should not affect the baselined violations.
  • Reduces (but not eliminates) the chance that a new issue is suppressed with in an existing file. This however depends on how many lines of code/tokens the signature is based on. More lines means better accuracy.
  • Small code reformat changes won't affect the signature.

Cons:

  • Changing code around a violation will change the signature and will break the suppression and suddenly will show the violation again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants