Skip to content

Add filter file support when creating Resource Groups.#18

Open
CCPCookies wants to merge 7 commits intocarbonengine:mainfrom
CCPCookies:filtering
Open

Add filter file support when creating Resource Groups.#18
CCPCookies wants to merge 7 commits intocarbonengine:mainfrom
CCPCookies:filtering

Conversation

@CCPCookies
Copy link
Collaborator

Changes

  • Filter file rules loading through legacy INI file format support added.
  • Filter logic matched from resfileserver and eve-resparser.
  • Documentation of filter file format added.
  • Documentation covering filter logic added.
  • Test coverage added for all known filter include scenarios.
  • Added new global filter rule which is useful for excluding .red files.
  • Filter logic for 'resfile' field not covered as it is covered by respaths logic.
  • New publicly exposed library function added to create Resource Groups with filtering.
  • Logic added to library function to allow skipping empty search directories.
  • Logic added to library to allow ascertaining compression size from remote filesystem.
  • Test coverage for library function added.
  • CLI extended to add Resource Group creation with filters.
  • Test coverage for CLI operation added.

Version

Rather than changing 'CreateFromDirectory' in library and 'create-group' in CLI. A new function and operation was added to create Resource Groups using filters. This was to keep the API stable as there are now other parties working with the origional commands. Version minor was bumped.

Extra

Filter ini file parsing logic is from PR16

Changes
-------
* Filter file rules loading through legacy INI file format support added.
* Filter logic matched from resfileserver and eve-resparser.
* Documentation of filter file format added.
* Documentation covering filter logic added.
* Test coverage added for all known filter include scenarios.
* Added new global filter rule which is useful for excluding .red files.
* Filter logic for 'resfile' field not covered as it is covered by respaths logic.
* New publicly exposed library function added to create Resource Groups with filtering.
* Logic added to library function to allow skipping empty search directories.
* Logic added to library to allow ascertaining compression size from remote filesystem.
* Test coverage for library function added.
* CLI extended to add Resource Group creation with filters.
* Test coverage for CLI operation added.

Version
-------
Rather than changing 'CreateFromDirectory' in library and 'create-group' in CLI.
A new function and operation was added to create Resource Groups using filters.
This was to keep the API stable as there are now other parties working with the origional commands.
Version minor was bumped.

Extra
-----
Filter ini file parsing logic is from PR16

#include "CreateResourceGroupFromFilterCliOperation.h"

#include <string>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


struct FilterFile
{
std::unordered_map<std::string, std::shared_ptr<Prefix>> prefixes;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE when you iterate over this std::unordered_map, you will not get the items returned in the insertion order.
I "think" you may want it to return items in insertion order as part of the ResourceFilter::SetFromFilterFileData() function, that is calling m_prefixPaths.push_back(), which is NOT guaranteed as currently implemented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, nice catch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also add a test to enforce the order importance

/// @note No file filtering supported
Result CreateFromDirectory( const CreateResourceGroupFromDirectoryParams& params );

/// @brief Creates a ResourceGroup from a supplied filter files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo.
... from supplied filter files. (remove the a)
or
... from a supplied filter file. (change to singular, remove s)


ParseIncludeExcludeRules( globalFiltersStr, fileData.includeRules, fileData.excludeRules );

// Get section infomration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:
// Get section information

ParseIncludeExcludeRules( globalFiltersStr, fileData.includeRules, fileData.excludeRules );

// Get section infomration
for( const auto& sectionName : reader.Sections() )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE:
The INIReader will return Sections in alphabetical order, not the order they appear in the .ini file.
If you want to keep the sections in the order as defined in the file, you will have to read the file manually and find all the sections and then iterate over that list (instead of reader.Sections())

I.e. change it to do:
std::vectorstd::string sectionsInOrder = ManuallyReadIniFileSectionsInOrderExcludingDefault();
for( const auto& sectionName : sectionsInOrder)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I remember you saying this. I don't see a scenario where order of sections matter.

I would also not want to be doing any ini parsing manually, this needs to be supplied by a library. The lib you found so far appears to be sort of ok. Another annoying thing it does is removes all the casing from the section names, not huge but I'd prefer it didn't.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It's annoying that it lowercases everything.
It also cuts each sectionName to the first either 48 or 50 characters (can't remember which)
But this ini reader was the best fit, because it:

  • had vcpkg support
  • emulated the original python ini file implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for if order of sections matter.
What about if the same respath and prefix are present in two different [namedSections], but one of them has an [someFilter] (include) where the other has the same ![someFilter] (exclude filter)?
What happens there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will match as it would with previous system


1. Globally in the ``[DEFAULT]`` section using ``filter =`` .
2. Section locally in sections using ``filter =`` .
3. Semi locally to respath adding include/exclude rules to each path ``respaths = prefix1:/* [ include ] ![ exclude ]``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct,
but it renders strange when viewed in a browser (splits the line).
It would be better if the "respaths = prefix1:/* [ include ] ![ exclude ]" would be put on the line below.


2. Section local filters are combined with any filters specified in global filters.

3. respaths filters combine with both global and section filters and importantly these add for all subsequent paths. This is explained more in the following examples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this supposed to be:
"...and importantly these add for all subsequent paths WITHIN THE SELECTED SECTION."
I.e. filter defined in [SectionA] will not also be applied to all entries in [SectionB] onwards.

Two paths will be tested for inclusion:

``#3`` will use ``respaths = prefix1:/*`` and combine global and section local patterns ``include1`` and ``include2``. This will match the following from the source files:
1. ``include1.txt``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an empty line before the "1. include1.txt", for it to render correctly in a browser.

2. ``include2.txt``

``#4`` will use ``respaths = prefix1:/* [ include3 ]`` which will extend the section local patterns to include ``include3``. This will match the following source files:
1. ``include1.txt``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line, so it renders correctly in a browser.

3. ``include3.txt``

``#5`` will use ``respaths = prefix2:/*`` and doesn't sepecify any include rules. It will apply the include rules that have been constructed for the section at this point ``include1``, ``include2`` and ``include3``. This may be suprising. So this will match the following source files:
1. ``Path/include3.txt``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line to it renders correctly in browser.


[exampleSection]
filter = [ include2 ] # 2. Section local include
respaths = prefix1:/* # 3. respath1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE this is not how multiple (multi-line) respaths are defined in existing res.ini files.
The correct example would look like:

[resCharacterMisc]
respaths = res:/Graphics/Character/Global/...
                  res:/Graphics/Character/Female/Skeleton/...
                  res:/Graphics/Character/Female/*
                  res:/Graphics/Character/Male/Skeleton/...
                  res:/Graphics/Character/Male/*
                  res:/Graphics/Character/Unique/...

Where the multi-line entries are within the SAME "respaths" attribute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I'll change the documentation. The actual tests don't do this, this is just documentation.

#include <unordered_set>

CreateResourceGroupFromFilterCliOperation::CreateResourceGroupFromFilterCliOperation() :
CliOperation( "create-group-from-filter", "Create a Resource Group from a filter files." ),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixing singular and plural

#include <Md5ChecksumStream.h>
#include <GzipCompressionStream.h>
#include <cctype>
#include "ResourceInfo/PatchResourceGroupInfo.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Result{ ResultType::FAILED_TO_INITIALIZE_RESOURCE_FILTER, errorMsg };
}

statusSettings.Update( StatusProgressType::PERCENTAGE, 0, 5, "Loading filter files" );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This status line has 0, 5 like the line for "Create resource group from filters".
Should the numbers be updated (is this a copy-paste error)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch


if( inputDirectoryStatus.RequiresStatusUpdates() )
{
float step = static_cast<float>( 100.0 / searchPaths.size() );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The step variable is going to be the same in every iteration of the loop.
Can be calculated outside fo the for loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, this way that computation is skipped if not verbose, so we don't calculate something we don't need when not caring about the output.

}
else
{
return Result{ ResultType::INPUT_DIRECTORY_DOESNT_EXIST, inputDirectory.string() };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused.
Why would you ever not want to skip non existent input directories?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only for testing/debug purposes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is due to real world usecase of reduced-resources.

It only syncs files that changed, so in theory it might (and usually does) not a single file in a search directory. If it's not synced then there is no directory. But this is not a fail case.


ss << "Processing file: "
<< filePathRelativeToInputDirectory.string()
<< ", Match filter: "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Match filter:" vs matchSection
I guess this is supposed to be "Match Section:" or "Match Section Id:"
Unless you also reference return the "current include/exclude filter from the CheckPath() function and return that as well. Might be useful for debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually was thinking to return the current line number for the path rule so that you can see really well. But not required any further information so didn't bother to skip some computation time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the wording is still wrong though, right?
This is supposed to be "section", not "filter", right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yeah, i've changed it to matched filter section

resourceParams.binaryOperation = ResourceTools::CalculateBinaryOperation( entry.path() );
}

Location l;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the name of the variable to be more descriptive?


#include <filesystem>
#include <vector>
#include <unordered_map>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports in alphabetical order


struct FilterPath
{
std::string sectionId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other structs / classes you've put an empty line between items.
Missing in this struct.


std::filesystem::path invalidPath = "File";

ASSERT_FALSE( resourceFilter.CheckPath( invalidPath ) );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check if something is valid


std::filesystem::path prefix2InvalidPath = "Path2/File";

ASSERT_FALSE( resourceFilter.CheckPath( prefix2InvalidPath ) );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to check for "SomethingValid" on both prefixes as well.


std::filesystem::path prefix2ValidPath = "Path2/File";

ASSERT_TRUE( resourceFilter.CheckPath( prefix2ValidPath ) );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about invalid checks in this test?


std::filesystem::path prefix2InvalidPath = "Path2/File";

ASSERT_FALSE( resourceFilter.CheckPath( prefix2InvalidPath ) );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add checks for "SomethingValid" on both prefixes as well.


std::filesystem::path invalidPath = "File";

ASSERT_FALSE( resourceFilter.CheckPath( invalidPath ) );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you check for "SomethingElse", will it match or not?

This change has been tested against all real branch data with no
obvious differences from the previous system it is to replace.

* Get index filter mapping from yaml file passed to CLI
* Reworked filter library processing to process many groups at once
* Filter logic now forced to lowercase to match previous tool
* Addressed previous draft PR feedback
@CCPCookies CCPCookies marked this pull request as ready for review March 11, 2026 16:08
// Load filter settings from file
std::filesystem::path basePathToFilterFiles = m_argumentParser->get<std::string>( m_filterFileBasePathId );

std::filesystem::path basePathRespourceFiles = m_argumentParser->get<std::string>( m_resourceFileBasePathId );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the French version "Respource" ?

}

// Load filter settings from file
std::filesystem::path basePathToFilterFiles = m_argumentParser->get<std::string>( m_filterFileBasePathId );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly commenting for myself:
So we have two basepaths:

  • FilterFiles => used to resolve paths defined in the .ini files
  • ResourceFiles => is that the basepath of the incoming files (from VCS), or is this something else?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we have 3

  • prefixMapBasePath as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now know what each of them does, good documentation in the AddArgument() functions, but I started looking at the header file that was missing it, causing confusion.

m_compressedSize = resourceInfo.m_compressedSize;
}

m_uncompressedSize = resourceInfo.m_uncompressedSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are only some of the DocumentParameter member variables doing a .HasValue() checks?
All of them contain an underlying m_value of a std::optional type


std::string m_filterIndexMappingFileId;

std::string m_filterFileBasePathId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be so good to have comments on what each of those do.
I'm having a bit of a brain aneurysm, wrapping my head around the differences between the:

  • filterFileBasePathId
  • resourceFileBasePathId
  • prefixMapBasePathId


AddArgument( m_filterFileBasePathId, "Base path to filter files.", true, true, "" );

AddArgument( m_resourceFileBasePathId, "Base path for output resource files.", false, true, "" );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be called outputResourceFileBasePathId or resourceOutputFileBasePathId instead?


AddArgument( m_exportResourcesDestinationPathId, "Represents the base path where the exported resources will be saved. Requires --export-resources", false, false, defaultImportParams.exportSettings.destinationSettings.basePath.string() );

AddArgument( m_prefixMapBasepathId, "Base directory for prefix mappings defined in filter files.", false, false, "" );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a required=true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you would never call CreateResourceFromFILTER unless you're passing in filter .ini file(s) and then the basepath for the prefix should be required, right?

}

std::cout << "Resource filter index mapping file: " << filterIndexMappingFilePath << std::endl;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it ouput the contents of the mapping file here as well?

}

auto resourceIter = resourceGroups.begin();
for( auto exportIter = exportParams.begin(); exportIter != exportParams.end(); exportIter++, resourceIter++ )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section would benefit from a short comment for clarity.


[exampleSection] # [OPTIONAL] Filter section
filter = [ include ] ![ exclude ] # [OPTIONAL] Section local include and exclude patterns
respaths = prefix1:/* # [OPTIONAL] Path pattern
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Respaths is a single attributeName but multi-line value.


ResourceDestinationSettings exportResourcesDestinationSettings = { CarbonResources::ResourceDestinationType::LOCAL_CDN, "ExportedResources" };

uintmax_t fileStreamChunkSize = 20971520;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number, replace with a NAMED_CONSTANT

*/
struct CreateResourceGroupFromFilterParams
{
uintmax_t fileStreamChunkSize = 20971520;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number again


#include <yaml-cpp/yaml.h>

FilterIndexMappingFile::FilterIndexMappingFile()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty constructor and destructor.
Make it "default" in the header file
FilterIndexMappingFile() = default;
~FilterIndexMappingFile() = default;

bool FilterIndexMappingFile::LoadFromFile( const std::filesystem::path& path )
{

// TODO add returns for invalid file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO


bool ignoreCase = false;

if (documentVersion == VersionInternal(0, 0, 0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to know why this is needed on csv files

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe created consts for 0.0.0 and 0.1.0
Similar to S_DOCUMENT_VERSION

int i = 0;
for( auto searchPath : searchPaths )
{
std::filesystem::path inputDirectory = params.filterSettings.prefixMapBasePath / searchPath;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this become just a single filename?
Asking because there are recursivDirectoryIterator for loops further down in this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about speed improvements in those cases.


std::filesystem::path entryPath = entry.path();

std::string entryPathString;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entryPathString is never appended to.
Where it is being used (to print out some status info) is only within RequireStatusUpdates() if statements blocks of their own.
I think you could just remove it and just use entryPath.string() instead in those cases.

continue;
}

if( filter->CheckPath( normalisedRelativePath, matchSection, matchPath ) )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ignoreCase==true and normalisedRelativePath has been made lower case.
Doesn't the filter->CheckPath need to make sure it does lowercase comparison as well?


Result RemoveResource( ResourceInfo& relativePath );

bool ResourceExists( ResourceInfo& resource );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is never used anywhere (as far as I can tell)

class ResourceFilter
{
public:
ResourceFilter();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be changed to be:
ResourceFilter() = default;

Same for destructor, as there is no implementation of either in the .cpp file (and it can be removed from those files)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants