-
Notifications
You must be signed in to change notification settings - Fork 101
feat: Copy/archive input XML files into the output directory #4030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kdrienCG
wants to merge
31
commits into
develop
Choose a base branch
from
feature/kdrienCG/archiveInputDeck
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
20b501c
add functions to collect included XML files
kdrienCG 3db8b1a
add archiveInputDeck function
kdrienCG 8afead3
call archiveInputDeck() in ProblemManager
kdrienCG 25c325b
modify archive directory name
kdrienCG 1cf33d5
add tests for collectIncluded functions
kdrienCG a4b35d0
fix typo in archiveInputDeck documentation
kdrienCG c873915
fix typo in collectIncludedRecursive documentation
kdrienCG 6efbe14
add missing header includes
kdrienCG 140aa51
add filter on collectIncluded iteration
kdrienCG 9ed7c03
add MPI rank 0 condition for archiveInputDeck call
kdrienCG 5c30635
add output directory invariant for archiveInputDeck
kdrienCG 0c5fb2d
add tests for archiveInputDeck
kdrienCG bf1ca66
modify archive's logic to flatten inputs
kdrienCG 6b35d1b
strip metadata attributes from the archived XML
kdrienCG a61d8c7
sort XML attributes in the archived XML
kdrienCG 8738084
copy schema.xsd to the archive
kdrienCG 93e2806
uncrustify
kdrienCG dbfa0fb
relocate archiveInputDeck call to generate the XSD schema
kdrienCG 5a7c196
add command line option to trigger the archiving
kdrienCG d745bf6
add levels to archiving command line option
kdrienCG f8acdb4
Merge branch 'develop' into feature/kdrienCG/archiveInputDeck
kdrienCG c97394d
remove surrounding characters in a comment
kdrienCG aa5a851
set default archive strategy level to 1
kdrienCG 7041f73
remove XSD schema generation
kdrienCG 135e3a9
relocate archiving in the ProblemManager
kdrienCG 377ebdf
log archive's creation
kdrienCG d498048
add option to copy the XSD schema
kdrienCG 6e229b8
modify order of schema candidates
kdrienCG c388276
add missing comma
kdrienCG 81084b8
remove collectIncluded* methods
kdrienCG 4eb17c7
add archive command line parameter to quick start example
kdrienCG File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,252 @@ | ||
| /* | ||
| * ------------------------------------------------------------------------------------------------------------ | ||
| * SPDX-License-Identifier: LGPL-2.1-only | ||
| * | ||
| * Copyright (c) 2016-2024 Lawrence Livermore National Security LLC | ||
| * Copyright (c) 2018-2024 TotalEnergies | ||
| * Copyright (c) 2018-2024 The Board of Trustees of the Leland Stanford Junior University | ||
| * Copyright (c) 2023-2024 Chevron | ||
| * Copyright (c) 2019- GEOS/GEOSX Contributors | ||
| * All rights reserved | ||
| * | ||
| * See top level LICENSE, COPYRIGHT, CONTRIBUTORS, NOTICE, and ACKNOWLEDGEMENTS files for details. | ||
| * ------------------------------------------------------------------------------------------------------------ | ||
| */ | ||
|
|
||
| /** | ||
| * @file ArchiveInputDeck.cpp | ||
| */ | ||
|
|
||
| #include "ArchiveInputDeck.hpp" | ||
|
|
||
| #include "common/GeosxConfig.hpp" | ||
| #include "common/Path.hpp" | ||
| #include "common/format/Format.hpp" | ||
| #include "common/logger/Logger.hpp" | ||
| #include "dataRepository/xmlWrapper.hpp" | ||
|
|
||
| #include <algorithm> | ||
| #include <chrono> | ||
| #include <filesystem> | ||
| #include <system_error> | ||
|
|
||
| namespace geos | ||
| { | ||
|
|
||
| using namespace dataRepository; | ||
|
|
||
| namespace archiveInputDeck | ||
| { | ||
|
|
||
| namespace | ||
| { | ||
|
|
||
| string makeTimestamp() | ||
| { | ||
| auto const now = std::chrono::system_clock::now(); | ||
| auto const time_t_now = std::chrono::system_clock::to_time_t( now ); | ||
| std::ostringstream timestampStream; | ||
| timestampStream << std::put_time( std::localtime( &time_t_now ), "%Y%m%d_%H%M%S" ); | ||
| return timestampStream.str(); | ||
| } | ||
|
|
||
| void stripMetadataAttributes( xmlWrapper::xmlNode node ) | ||
| { | ||
| node.remove_attribute( xmlWrapper::filePathString ); | ||
| node.remove_attribute( xmlWrapper::charOffsetString ); | ||
|
|
||
| for( xmlWrapper::xmlNode child : node.children() ) | ||
| { | ||
| stripMetadataAttributes( child ); | ||
| } | ||
| } | ||
|
|
||
| void reorderTags( xmlWrapper::xmlNode rootNode, string_array const & tagOrder ) | ||
| { | ||
| xmlWrapper::xmlNode lastInserted; | ||
| for( string const & tagName : tagOrder ) | ||
| { | ||
| xmlWrapper::xmlNode tag = rootNode.child( tagName.c_str() ); | ||
| if( !tag ) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| lastInserted ? rootNode.insert_move_after( tag, lastInserted ) | ||
| : rootNode.append_move( tag ); | ||
|
|
||
| lastInserted = tag; | ||
| } | ||
|
|
||
| // ProblemManager's order list doesn't provide every XML tags available in GEOS | ||
| // so we put the missing ones below the ones it provides. | ||
| // And sort them alphabetically | ||
| stdVector< string > missingTags; | ||
|
|
||
| for( xmlWrapper::xmlNode const & tag : rootNode.children() ) | ||
| { | ||
| string const & tagName = tag.name(); | ||
|
|
||
| if( std::find( tagOrder.begin(), tagOrder.end(), tag.name() ) == tagOrder.end() ) | ||
| { | ||
| missingTags.push_back( tagName ); | ||
| } | ||
| } | ||
|
|
||
| std::sort( missingTags.begin(), missingTags.end() ); | ||
|
|
||
| for( string const & tagName : missingTags ) | ||
| { | ||
| xmlWrapper::xmlNode tag = rootNode.child( tagName.c_str() ); | ||
|
|
||
| if( tag ) | ||
| { | ||
| rootNode.append_move( tag ); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| void sortAttributes( xmlWrapper::xmlNode node ) | ||
| { | ||
| stdVector< std::pair< string, string > > attributes; | ||
| for( xmlWrapper::xmlAttribute attr = node.first_attribute(); | ||
| attr; | ||
| attr = attr.next_attribute() ) | ||
| { | ||
| attributes.emplace_back( attr.name(), attr.value() ); | ||
| } | ||
|
|
||
| std::sort( attributes.begin(), | ||
| attributes.end(), | ||
| []( std::pair< string, string > const & a, | ||
| std::pair< string, string > const & b ) | ||
| { | ||
| // name attribute should be the first attribute, and not sorted alphabetically | ||
| bool const aIsName = ( a.first == "name" ); | ||
| bool const bIsName = ( b.first == "name" ); | ||
| if( aIsName != bIsName ) | ||
| { | ||
| return aIsName; | ||
| } | ||
|
|
||
| // other attributes are sorted alphabetically | ||
| return a.first < b.first; | ||
| } ); | ||
|
|
||
| // pugi doesn't have any move_attribute method yet, so we have to | ||
| // copy and remove attributes | ||
| while( node.remove_attribute( node.first_attribute() ) ) | ||
| {} | ||
| for( auto const & attr : attributes ) | ||
| { | ||
| node.append_attribute( attr.first.c_str() ).set_value( attr.second.c_str() ); | ||
| } | ||
|
|
||
| for( xmlWrapper::xmlNode child : node.children() ) | ||
| { | ||
| sortAttributes( child ); | ||
| } | ||
| } | ||
|
|
||
| xmlWrapper::xmlDocument flattenXMLs( string_array const & fileNames ) | ||
| { | ||
| xmlWrapper::xmlDocument flatDoc; | ||
| xmlWrapper::xmlNode root = flatDoc.appendChild( "Problem" ); | ||
|
|
||
| for( string const & fileName : fileNames ) | ||
| { | ||
| xmlWrapper::xmlDocument doc; | ||
| xmlWrapper::xmlResult const result = doc.loadFile( fileName, true ); | ||
| GEOS_THROW_IF( !result, | ||
| GEOS_FMT( "Could not load XML file '{}': {}", fileName, result.description() ), | ||
| InputError ); | ||
| xmlWrapper::xmlNode docRoot = doc.getFirstChild(); | ||
|
|
||
| doc.addIncludedXML( docRoot ); | ||
|
|
||
| for( xmlWrapper::xmlNode & node : docRoot.children() ) | ||
| { | ||
| root.append_copy( node ); | ||
| } | ||
| } | ||
|
|
||
| return flatDoc; | ||
| } | ||
|
|
||
| void copySchemaToArchive( string const & archiveDir ) | ||
| { | ||
| std::filesystem::path const candidates[] = { | ||
| GEOS_SCHEMA_SOURCE_PATH, | ||
| GEOS_SCHEMA_INSTALL_PATH | ||
| }; | ||
|
|
||
| std::error_code ec; | ||
| for( std::filesystem::path const & source : candidates ) | ||
| { | ||
| if( source.empty() || !std::filesystem::is_regular_file( source, ec ) ) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| std::filesystem::path const destination = std::filesystem::path( archiveDir ) / "schema.xsd"; | ||
| std::filesystem::copy_file( source, | ||
| destination, | ||
| ec ); | ||
|
|
||
| if( ec ) | ||
| { | ||
| GEOS_WARNING( GEOS_FMT( "Failed to copy XSD schema to archive '{}': {}", | ||
| destination.string(), ec.message() ) ); | ||
| return; | ||
| } | ||
|
|
||
| GEOS_LOG_RANK_0( GEOS_FMT( "Archived XSD schema: {}", | ||
| getAbsolutePath( destination.string() ) ) ); | ||
|
|
||
| return; | ||
| } | ||
|
|
||
| GEOS_WARNING( "Could not locate the XSD schema for archiving" ); | ||
| } | ||
|
|
||
|
|
||
| } | ||
|
|
||
|
|
||
| void archiveInputDeck( string_array const & inputFileNames, | ||
| string const & outputDirectory, | ||
| string_array const & xmlTagOrder, | ||
| integer const level ) | ||
| { | ||
| if( level == 0 || inputFileNames.empty() || outputDirectory.empty() ) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| string const timestamp = makeTimestamp(); | ||
| string const archiveDir = joinPath( outputDirectory, "archive_inputFiles", timestamp ); | ||
| makeDirsForPath( archiveDir + "/" ); | ||
|
|
||
| xmlWrapper::xmlDocument flatDoc = flattenXMLs( inputFileNames ); | ||
| xmlWrapper::xmlNode root = flatDoc.getFirstChild(); | ||
|
|
||
| stripMetadataAttributes( root ); | ||
| reorderTags( root, xmlTagOrder ); | ||
| sortAttributes( root ); | ||
|
|
||
| string const inputArchiveFile = joinPath( archiveDir, "input.xml" ); | ||
| flatDoc.saveFile( inputArchiveFile ); | ||
|
|
||
| GEOS_LOG_RANK_0( GEOS_FMT( "Archived XML inputs: {}", | ||
| getAbsolutePath( inputArchiveFile ) ) ); | ||
|
|
||
| if( level >= 2 ) | ||
| { | ||
| copySchemaToArchive( archiveDir ); | ||
| } | ||
| } | ||
|
|
||
|
|
||
| } /* namespace archiveInputDeck */ | ||
|
|
||
| } /* namespace geos */ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| /* | ||
| * ------------------------------------------------------------------------------------------------------------ | ||
| * SPDX-License-Identifier: LGPL-2.1-only | ||
| * | ||
| * Copyright (c) 2016-2024 Lawrence Livermore National Security LLC | ||
| * Copyright (c) 2018-2024 TotalEnergies | ||
| * Copyright (c) 2018-2024 The Board of Trustees of the Leland Stanford Junior University | ||
| * Copyright (c) 2023-2024 Chevron | ||
| * Copyright (c) 2019- GEOS/GEOSX Contributors | ||
| * All rights reserved | ||
| * | ||
| * See top level LICENSE, COPYRIGHT, CONTRIBUTORS, NOTICE, and ACKNOWLEDGEMENTS files for details. | ||
| * ------------------------------------------------------------------------------------------------------------ | ||
| */ | ||
|
|
||
| /** | ||
| * @file ArchiveInputDeck.hpp | ||
| */ | ||
|
|
||
| #ifndef GEOS_FILEIO_OUTPUTS_ARCHIVEINPUTDECK_HPP_ | ||
| #define GEOS_FILEIO_OUTPUTS_ARCHIVEINPUTDECK_HPP_ | ||
|
|
||
| #include "common/DataTypes.hpp" | ||
|
|
||
| namespace geos | ||
| { | ||
|
|
||
| namespace archiveInputDeck | ||
| { | ||
|
|
||
| /** | ||
| * @brief Archive the XML input deck (and optionally the XSD schema) into the | ||
| * output directory. | ||
| * @param inputFileNames Container of XML file names to start the copy from | ||
| * @param outputDirectory The output directory to copy files into | ||
| * @param xmlTagOrder The order of the XML tags in the XML archive file | ||
| * @param level Archiving strategy level: | ||
| * - 0: no archiving (returns immediately) | ||
| * - 1: XML inputs only (flattened into a single file) | ||
| * - 2: XML inputs + the XSD schema | ||
| * | ||
| * Copy XML input files and every included files they contain (specified in | ||
| * the Included tag) into a single flat file. When @p level is at least 2, the | ||
| * XSD schema is also copied next to the flattened input. | ||
| */ | ||
| void archiveInputDeck( string_array const & inputFileNames, | ||
| string const & outputDirectory, | ||
| string_array const & xmlTagOrder, | ||
| integer level ); | ||
|
|
||
| } /* namespace archiveInputDeck */ | ||
|
|
||
| } /* namespace geos */ | ||
|
|
||
|
|
||
| #endif // GEOS_FILEIO_OUTPUTS_ARCHIVEINPUTDECK_HPP_ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature will be incredibly useful in our production environment. That said, I would personally keep the existing GEOS default behavior (no archiving) unchanged for regular usage, CI, etc.
Same for example as CSV output, which isn't on by default.