Outpost Universe Forums

Off Topic => Computers & Programming General => Topic started by: Hooman on February 26, 2019, 03:20:12 AM

Title: MSBuild - Shell Globs
Post by: Hooman on February 26, 2019, 03:20:12 AM
I came across an entry on MSBuild in the Visual Studio documentation:
How to: Select the files to build (https://docs.microsoft.com/en-us/visualstudio/msbuild/how-to-select-the-files-to-build?view=vs-2017)

It includes files into the project file using shell globs, such as * and the newer **. The newer ** syntax allows you to recursively select directories. Hence, all .cpp files in subtree could be specified as:

If you have multiple projects in the solution, perhaps source files in src/ and a test project in test/, you could divide up the .cpp files using:

I was thinking, using an include like that might mean fewer updates to the project file when new source files are added. Everything would just automatically work. Though, this is the MSBuild documentation. I don't really know if the Visual Studio IDE has support for such syntax for the project tree that it shows.
Title: Re: MSBuild - Shell Globs
Post by: Vagabond on February 28, 2019, 08:55:03 PM
Oddly enough, cmake documentation actively discourages finding source files by using GLOBS. It makes a point to say that you should specify each file individually. I think it had something to do with notifying the compilation and linking process that a file had changed or was missing but I didn't quite learn the details. It just stuck out to me as odd at the time because I was thinking to myself, hey I'll just grab all these files with *.cpp and *.h.



We do not recommend using GLOB to collect a list of source files from your source tree. If no CMakeLists.txt file changes when a source is added or removed then the generated build system cannot know when to ask CMake to regenerate. The CONFIGURE_DEPENDS flag may not work reliably on all generators, or if a new generator is added in the future that cannot support it, projects using it will be stuck. Even if CONFIGURE_DEPENDS works reliably, there is still a cost to perform the check on every rebuild.

Don't know if this logic applies only to cmake or also extends to VS?

A bit off topic, but Visual Studio uses a virtual folder structure that I've found non-intuitive called filters. So you can put files in different folders in the same filter, and they would appear in the same folder within Visual Studio. I've found this behaviour to be counter-intuitive and a source of frustration.

Title: Re: MSBuild - Shell Globs
Post by: Arklon on March 01, 2019, 11:03:19 PM
Globs are generally a bad idea for any production build with any build system if you want reproducible builds.
Title: Re: MSBuild - Shell Globs
Post by: Hooman on March 02, 2019, 01:38:40 PM
I remember seeing the suggestion in the CMake documentation to not use globs, and thought that was a rather strange suggestion.

It seems sensible to compile all the source files in your project as a default convention. If you had a source file that didn't need to be compiled, I'd expect it might get deleted, renamed, or moved elsewhere in the source tree so it wasn't mixed with the other source files.

Though that perhaps has an assumption about having a clean checkout. Things don't work quite as smoothly when switching between branches, particularly with work in progress. I've noticed with Git, modified files are kept in the working directory when switching to a branch that didn't originally contain that file.

In terms of CMake though, I think the problem was about not regenerating files (the makefile? what if the makefile also used globs?) when the main CMakeLists.txt file hasn't been updated. If you list files explicitly, then it needs to be updated when adding new source files. I kind of think that's a bit of a kludge though. I think the problem can be solved similarly to how lockfiles work.

If you use tools like rubygems, or npm, they have lockfiles. You specify a package list for dependencies and optionally required versions of those packages. When packages are downloaded, or during a manual update of dependencies, the tools write a new lockfile which includes a list of all packages, and the versions that were actually downloaded and configured. The lockfile is then committed to version control. Any downloads with the lockfile present, will use the versions specified in the lockfile. That means that even if you didn't specify exact versions in the main dependency file, you still get reproducibility from the lockfile. And it's way easier to both specify dependencies that way, and to upgrade them. You only need to fiddle with versions numbers to add restrictions if something breaks during a manual update. Otherwise the tools handle all the messy version number details for you.

For CMake, it could potentially create a lockfile equivalent that lists all the source files that were found during the shell glob operation. If the current folder tree doesn't match the lockfile, regenerate.

Anyway, that's my understanding of what's happening. Admittedly I don't know CMake all that well.

Arklon, in case I've missed something there, please elaborate.
Title: Re: MSBuild - Shell Globs
Post by: Arklon on March 03, 2019, 01:31:26 PM
The biggest problem with using globs is you don't have control over the order in which things are built, and it may in fact be undefined. That can lead to problems with dependencies not being built in the correct order, for instance. There's also just the fact that any files will get picked up makes it easy to "dirty", so you could have something like a stray .cpp file that isn't even tracked by Git get picked up.

I have used globs before for making simple one-off test projects out of sheer laziness, but I would never use them for something published on Github.
Title: Re: MSBuild - Shell Globs
Post by: Hooman on March 04, 2019, 12:29:32 AM
I don't know. The whole point of Makefile is to specify dependencies, which should imply a build order (or at least a partial ordering). I imagine CMake would retain that aspect.

I suppose parallel builds are not completely deterministic, as that relates to the partial ordering. Though having a partial ordering is saying that certain variations really don't matter. In the case of C++, each compilation unit should be compilable independently. You'd expect to get errors if they weren't. In the case of source code generators like Lex & Yacc, there'd be a build rule to specify the dependency to ensure proper build order.

As for having a "dirty" checkout, with untracked files, I think it's desirable they get compiled. Those untracked files are probably the ones you're working on. If not, they should either be stashed or removed.

As far as I can see, the most serious case for not using shell globs, is because there's no lockfile mechanism. And although I'm less certain about this next point, I suspect that only really matters if the CMake output is generating files with the pre-built list of paths, rather than also using shell globs.