1
0
Fork 0

Adding upstream version 0.1.0.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-05-30 18:55:23 +02:00
parent 550be5da31
commit 638d6148c0
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
12 changed files with 539 additions and 0 deletions

28
LICENSE Normal file
View file

@ -0,0 +1,28 @@
Copyright (c) 2022, Chris Koch <kopachris@gmail.com>
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
(1) Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
(2) Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
(3)The name of the author may not be used to
endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

72
PKG-INFO Normal file
View file

@ -0,0 +1,72 @@
Metadata-Version: 2.1
Name: sortxml
Version: 0.1.0
Summary: A simple XML element sorter
Home-page: https://www.github.com/kopachris/sortxml
Author: Chris Koch
Author-email: kopachris@gmail.com
License: BSD 3-Clause License
Keywords: xml,sort
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Topic :: Utilities
License-File: LICENSE
# sortxml - a simple XML element sorter
This module can be used by importing `sortxml.sort_xml` or by running standalone from the command-line.
## Using `sort_xml()`:
Returns an ElementTree representing the resulting whole document. ElementTree can easily be converted to string or written to a file like so:
```python
>>> foo_str = ET.tostring(sort_xml(xml_doc, node_path, sort_attr).getroot())
# Or...
>>> sort_xml(xml_doc, node_path, sort_attr).write('foo.xml')
```
### Required arguments:
* `xml_doc` -- a text IO stream (such as an open file object), Path object pointing to an XML
file, string representing the file path, or string containing the file contents of a valid XML file. Can't take
an ElementTree instance because we need to use our own parser to keep track of namespaces.
* `node_path` -- a string containing the path to the node you want to sort the children of in the XPath language
of the etree module
* `sort_attr` -- the attribute of the child elements to use as the sort key
### Optional arguments:
* `use_text` -- use `sort_attr` as the name of a subelement of the path's children whose text will be the
sort key (default: False)
* `sort_as_datetime` -- try to parse the values of the sort key as a datetime using the `dateutil` module and sort
chronologically (default: False, mutually exclusive with `sort_as_decimal`)
* `sort_as_decimal` -- try to parse the values of the sort key as a decimal and sort numerically (useful to keep
'10' from showing up right after '1') (default: False, mutually exclusive with `sort_as_datetime`)
* `descending` -- sort in descending order instead of ascending (default: False)
## Usage on the command line:
Run `python -m sortxml -h` to display this help text.
Usage: sortxml [-h] [-v] [-r] [-t] [--datetime | --decimal] [-o OUTPUT_FILE] input_file sort_xpath sort_attr
A simple XML element sorter. Will sort the children of selected elements using a given attribute's value or subelement's text as the sort key.
Example usage:
$ python sortxml.py ARForm_orig.rdl "./DataSets/DataSet[@Name='ARForm']/Fields" Name -o ARForm.rdl
### Positional arguments:
* _**input_file**_ File path to the source xml file.
* _**sort_xpath**_ XPath-style selector for elements to sort the children of. This has the same limitations as Python's ElementTree module.
* _**sort_attr**_ The name of the attribute to use as the sort key.
### Options:
* _**-h, --help**_ show this help message and exit
* _**-v, --version**_ show program's version number and exit
* _**-r, --reverse, --descending**_ Sort the child elements in reverse (descending) order.
* _**-t, --text, --use-text**_ Treat the sort attribute name as the name of a subelement whose text is the sort key.
* _**--datetime, --as-datetime**_ Try to parse the sort key as a date/time value. Mutually exclusive with --decimal.
* _**--decimal, --as-decimal**_ Try to parse the sort key as a decimal number. Mutually exclusive with --datetime.
* _**-o OUTPUT_FILE, --output OUTPUT_FILE**_ File path to the destination file. (Default is to append '_sorted' to the filename before the extension.)

57
README.md Normal file
View file

@ -0,0 +1,57 @@
# sortxml - a simple XML element sorter
This module can be used by importing `sortxml.sort_xml` or by running standalone from the command-line.
## Using `sort_xml()`:
Returns an ElementTree representing the resulting whole document. ElementTree can easily be converted to string or written to a file like so:
```python
>>> foo_str = ET.tostring(sort_xml(xml_doc, node_path, sort_attr).getroot())
# Or...
>>> sort_xml(xml_doc, node_path, sort_attr).write('foo.xml')
```
### Required arguments:
* `xml_doc` -- a text IO stream (such as an open file object), Path object pointing to an XML
file, string representing the file path, or string containing the file contents of a valid XML file. Can't take
an ElementTree instance because we need to use our own parser to keep track of namespaces.
* `node_path` -- a string containing the path to the node you want to sort the children of in the XPath language
of the etree module
* `sort_attr` -- the attribute of the child elements to use as the sort key
### Optional arguments:
* `use_text` -- use `sort_attr` as the name of a subelement of the path's children whose text will be the
sort key (default: False)
* `sort_as_datetime` -- try to parse the values of the sort key as a datetime using the `dateutil` module and sort
chronologically (default: False, mutually exclusive with `sort_as_decimal`)
* `sort_as_decimal` -- try to parse the values of the sort key as a decimal and sort numerically (useful to keep
'10' from showing up right after '1') (default: False, mutually exclusive with `sort_as_datetime`)
* `descending` -- sort in descending order instead of ascending (default: False)
## Usage on the command line:
Run `python -m sortxml -h` to display this help text.
Usage: sortxml [-h] [-v] [-r] [-t] [--datetime | --decimal] [-o OUTPUT_FILE] input_file sort_xpath sort_attr
A simple XML element sorter. Will sort the children of selected elements using a given attribute's value or subelement's text as the sort key.
Example usage:
$ python sortxml.py ARForm_orig.rdl "./DataSets/DataSet[@Name='ARForm']/Fields" Name -o ARForm.rdl
### Positional arguments:
* _**input_file**_ File path to the source xml file.
* _**sort_xpath**_ XPath-style selector for elements to sort the children of. This has the same limitations as Python's ElementTree module.
* _**sort_attr**_ The name of the attribute to use as the sort key.
### Options:
* _**-h, --help**_ show this help message and exit
* _**-v, --version**_ show program's version number and exit
* _**-r, --reverse, --descending**_ Sort the child elements in reverse (descending) order.
* _**-t, --text, --use-text**_ Treat the sort attribute name as the name of a subelement whose text is the sort key.
* _**--datetime, --as-datetime**_ Try to parse the sort key as a date/time value. Mutually exclusive with --decimal.
* _**--decimal, --as-decimal**_ Try to parse the sort key as a decimal number. Mutually exclusive with --datetime.
* _**-o OUTPUT_FILE, --output OUTPUT_FILE**_ File path to the destination file. (Default is to append '_sorted' to the filename before the extension.)

3
pyproject.toml Normal file
View file

@ -0,0 +1,3 @@
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

28
setup.cfg Normal file
View file

@ -0,0 +1,28 @@
[metadata]
name = sortxml
version = attr: sortxml.__version_str__
description = A simple XML element sorter
long_description = file: README.md
keywords = xml, sort
license = BSD 3-Clause License
license_files = LICENSE
url = https://www.github.com/kopachris/sortxml
author = Chris Koch
author_email = kopachris@gmail.com
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: BSD License
Topic :: Text Processing :: Markup :: XML
Topic :: Utilities
[options]
py_modules = sortxml
install_requires =
python-dateutil
setup_requires =
python-dateutil
[egg_info]
tag_build =
tag_date = 0

3
setup.py Normal file
View file

@ -0,0 +1,3 @@
from setuptools import setup
setup()

72
sortxml.egg-info/PKG-INFO Normal file
View file

@ -0,0 +1,72 @@
Metadata-Version: 2.1
Name: sortxml
Version: 0.1.0
Summary: A simple XML element sorter
Home-page: https://www.github.com/kopachris/sortxml
Author: Chris Koch
Author-email: kopachris@gmail.com
License: BSD 3-Clause License
Keywords: xml,sort
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Topic :: Utilities
License-File: LICENSE
# sortxml - a simple XML element sorter
This module can be used by importing `sortxml.sort_xml` or by running standalone from the command-line.
## Using `sort_xml()`:
Returns an ElementTree representing the resulting whole document. ElementTree can easily be converted to string or written to a file like so:
```python
>>> foo_str = ET.tostring(sort_xml(xml_doc, node_path, sort_attr).getroot())
# Or...
>>> sort_xml(xml_doc, node_path, sort_attr).write('foo.xml')
```
### Required arguments:
* `xml_doc` -- a text IO stream (such as an open file object), Path object pointing to an XML
file, string representing the file path, or string containing the file contents of a valid XML file. Can't take
an ElementTree instance because we need to use our own parser to keep track of namespaces.
* `node_path` -- a string containing the path to the node you want to sort the children of in the XPath language
of the etree module
* `sort_attr` -- the attribute of the child elements to use as the sort key
### Optional arguments:
* `use_text` -- use `sort_attr` as the name of a subelement of the path's children whose text will be the
sort key (default: False)
* `sort_as_datetime` -- try to parse the values of the sort key as a datetime using the `dateutil` module and sort
chronologically (default: False, mutually exclusive with `sort_as_decimal`)
* `sort_as_decimal` -- try to parse the values of the sort key as a decimal and sort numerically (useful to keep
'10' from showing up right after '1') (default: False, mutually exclusive with `sort_as_datetime`)
* `descending` -- sort in descending order instead of ascending (default: False)
## Usage on the command line:
Run `python -m sortxml -h` to display this help text.
Usage: sortxml [-h] [-v] [-r] [-t] [--datetime | --decimal] [-o OUTPUT_FILE] input_file sort_xpath sort_attr
A simple XML element sorter. Will sort the children of selected elements using a given attribute's value or subelement's text as the sort key.
Example usage:
$ python sortxml.py ARForm_orig.rdl "./DataSets/DataSet[@Name='ARForm']/Fields" Name -o ARForm.rdl
### Positional arguments:
* _**input_file**_ File path to the source xml file.
* _**sort_xpath**_ XPath-style selector for elements to sort the children of. This has the same limitations as Python's ElementTree module.
* _**sort_attr**_ The name of the attribute to use as the sort key.
### Options:
* _**-h, --help**_ show this help message and exit
* _**-v, --version**_ show program's version number and exit
* _**-r, --reverse, --descending**_ Sort the child elements in reverse (descending) order.
* _**-t, --text, --use-text**_ Treat the sort attribute name as the name of a subelement whose text is the sort key.
* _**--datetime, --as-datetime**_ Try to parse the sort key as a date/time value. Mutually exclusive with --decimal.
* _**--decimal, --as-decimal**_ Try to parse the sort key as a decimal number. Mutually exclusive with --datetime.
* _**-o OUTPUT_FILE, --output OUTPUT_FILE**_ File path to the destination file. (Default is to append '_sorted' to the filename before the extension.)

View file

@ -0,0 +1,11 @@
LICENSE
README.md
pyproject.toml
setup.cfg
setup.py
sortxml.py
sortxml.egg-info/PKG-INFO
sortxml.egg-info/SOURCES.txt
sortxml.egg-info/dependency_links.txt
sortxml.egg-info/requires.txt
sortxml.egg-info/top_level.txt

View file

@ -0,0 +1 @@

View file

@ -0,0 +1 @@
python-dateutil

View file

@ -0,0 +1 @@
sortxml

262
sortxml.py Normal file
View file

@ -0,0 +1,262 @@
#!/usr/bin/python310
"""Simple XML element sorter.
This module can be used by importing `sort_xml` or by running standalone from the command-line.
"""
# Copyright (c) 2022, Chris Koch <kopachris@gmail.com>
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# (1) Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# (2) Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
#
# (3)The name of the author may not be used to
# endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
# INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
# STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
# IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
__version__ = (0, 1, 0)
__version_str__ = '.'.join([str(v) for v in __version__])
__description__ = """
A simple XML element sorter. Will sort the children of selected elements
using a given attribute's value or subelement's text as the sort key.
Example usage:
$ python sortxml.py ARForm_orig.rdl "./DataSets/DataSet[@Name='ARForm']/Fields" Name -o ARForm.rdl
"""
import argparse as ap
import xml.etree.ElementTree as ET
from pathlib import Path
from io import TextIOWrapper
from codecs import BOM_UTF8
from decimal import Decimal
from dateutil.parser import parse as parse_dt
class NSElement(ET.Element):
"""Subclass of ElementTree.Element which keeps track of its TreeBuilder and namespaces if available."""
def __init__(self, *args, **kwargs):
self._ns_map = dict()
self._builder = None
if 'builder' in kwargs:
builder = kwargs.pop('builder')
self._builder = builder
if hasattr(builder, 'ns_map'):
self._ns_map = builder.ns_map
super().__init__(*args, **kwargs)
def find(self, path, namespaces=None):
if namespaces is None:
namespaces = self._ns_map
return super().find(path, namespaces)
def findall(self, path, namespaces=None):
if namespaces is None:
namespaces = self._ns_map
return super().findall(path, namespaces)
def findtext(self, path, default=None, namespaces=None):
if namespaces is None:
namespaces = self._ns_map
return super().findtext(path, default, namespaces)
def iterfind(self, path, namespaces=None):
if namespaces is None:
namespaces = self._ns_map
return super().iterfind(path, namespaces)
class NSTreeBuilder(ET.TreeBuilder):
"""Subclass of ElementTree.TreeBuilder which adds namespaces in the document to the namespace registry."""
def __init__(self, **kwargs):
self.ns_map = dict()
if 'element_factory' in kwargs:
del kwargs['element_factory']
super().__init__(element_factory=NSElement, **kwargs)
def start_ns(self, prefix, uri):
self.ns_map[prefix] = uri
ET.register_namespace(prefix, uri)
def start(self, tag, attrs):
if self._factory is NSElement:
self._flush()
self._last = e = self._factory(tag, attrs, builder=self)
if self._elem:
self._elem[-1].append(e)
elif self._root is None:
self._root = e
self._elem.append(e)
self._tail = 0
return e
else:
return super().start(tag, attrs)
def _handle_single(self, factory, insert, *args):
if factory is NSElement:
e = factory(*args, builder=self)
if insert:
self._flush()
self._last = e
if self._elem:
self._elem[-1].append(e)
self._tail = 1
return e
else:
return super()._handle_single(factory, insert, *args)
def sort_xml(xml_doc, node_path, sort_attr, use_text=False, sort_as_datetime=False, sort_as_decimal=False,
descending=False):
"""Sort the children of a selection of elements in an XML document. Returns an ElementTree representing the
resulting whole document. ElementTree can easily be converted to string or written to a file like so:
>>> foo_str = ET.tostring(sort_xml(xml_doc, node_path, sort_attr).getroot())
>>> sort_xml(xml_doc, node_path, sort_attr).write('foo.xml')
Required arguments:
-------------------
* `xml_doc` -- a text IO stream (such as an open file object), Path object pointing to an XML
file, string representing the file path, or string containing the file contents of a valid XML file. Can't take
an ElementTree instance because we need to use our own parser to keep track of namespaces.
* `node_path` -- a string containing the path to the node you want to sort the children of in the XPath language
of the etree module
* `sort_attr` -- the attribute of the child elements to use as the sort key
Optional arguments:
-------------------
* `use_text` -- use `sort_attr` as the name of a subelement of the path's children whose text will be the
sort key (default: False)
* `sort_as_datetime` -- try to parse the values of the sort key as a datetime using the `dateutil` module and sort
chronologically (default: False, mutually exclusive with `sort_as_decimal`)
* `sort_as_decimal` -- try to parse the values of the sort key as a decimal and sort numerically (useful to keep
'10' from showing up right after '1') (default: False, mutually exclusive with `sort_as_datetime`)
* `descending` -- sort in descending order instead of ascending (default: False)
"""
# check parameters
# xml_doc
if isinstance(xml_doc, TextIOWrapper) and xml_doc.readable():
# xml_doc is a readable text stream, let's read it
# but first make sure to remove any byte order marker
if xml_doc.encoding != 'utf-8-sig':
xml_doc.reconfigure(encoding='utf-8-sig')
xml_str = xml_doc.read()
elif isinstance(xml_doc, Path) and xml_doc.is_file():
# xml_doc is a Path object to a file
xml_str = xml_doc.read_text('utf-8-sig') # utf-8-sig to remove byte order marker
elif isinstance(xml_doc, str) and Path(xml_doc).is_file():
# xml_doc is a filename
xml_str = Path(xml_doc).read_text('utf-8-sig')
elif isinstance(xml_doc, str) and len(xml_doc) > 0:
# xml_doc hopefully contains valid XML
if xml_doc.startswith(BOM_UTF8.decode('utf-8')):
xml_str = xml_doc[3:]
else:
xml_str = xml_doc
else:
raise TypeError("sort_xml() requires first parameter must be a string, readable IO stream, or path for a "
f"valid xml file! xml_doc: {repr(xml_doc)}")
# sort_attr
if not (isinstance(sort_attr, str) and len(sort_attr) > 0):
raise TypeError("sort_xml() requires sort attribute must be a non-empty string!\n\t"
f"sort_attr: {repr(sort_attr)}")
else:
sort_attr = sort_attr.strip()
if not (sort_attr.replace('_', '').isalnum() and (sort_attr[0].isalpha() or sort_attr[0] == '_')):
raise ValueError("Sort attribute passed to sort_xml() is an invalid name!\n\t"
f"sort_attr: {repr(sort_attr)}")
# make our element tree using our custom treebuilder and get all the parents we have to sort children of
dom = ET.fromstring(xml_str, ET.XMLParser(target=NSTreeBuilder()))
matching_parents = dom.findall(node_path)
# check what kind of sorting we're doing and do it
# TODO might be faster if we do the check once and then run the appropriate for loop?
for par in matching_parents:
if use_text:
if sort_as_datetime:
par[:] = sorted(par, key=lambda x: parse_dt(x.findtext(sort_attr)), reverse=descending)
elif sort_as_decimal:
par[:] = sorted(par, key=lambda x: Decimal(x.findtext(sort_attr)), reverse=descending)
else:
par[:] = sorted(par, key=lambda x: x.findtext(sort_attr), reverse=descending)
elif sort_as_datetime:
par[:] = sorted(par, key=lambda x: parse_dt(x.get(sort_attr)), reverse=descending)
elif sort_as_decimal:
par[:] = sorted(par, key=lambda x: Decimal(x.get(sort_attr)), reverse=descending)
else:
par[:] = sorted(par, key=lambda x: x.get(sort_attr), reverse=descending)
return ET.ElementTree(dom)
if __name__ == '__main__':
argp = ap.ArgumentParser(description=__description__, formatter_class=ap.RawDescriptionHelpFormatter)
argp.add_argument('-v', '--version', action='version', version=f"%(prog)s -- version {__version_str__}")
argp.add_argument('input_file', type=Path, help="File path to the source xml file.")
argp.add_argument('sort_xpath',
help="XPath-style selector for elements to sort the children of. This has the same limitations "
"as Python's ElementTree module.")
argp.add_argument('sort_attr', help="The name of the attribute to use as the sort key.")
argp.add_argument('-r', '--reverse', '--descending', action='store_true', dest='descending',
help="Sort the child elements in reverse (descending) order.")
argp.add_argument('-t', '--text', '--use-text', action='store_true', dest='use_text',
help="Treat the sort attribute name as the name of a subelement whose text is the sort key.")
sort_style = argp.add_mutually_exclusive_group()
sort_style.add_argument('--datetime', '--as-datetime', action='store_true', dest='as_datetime',
help="Try to parse the sort key as a date/time value. Mutually exclusive with --decimal.")
sort_style.add_argument('--decimal', '--as-decimal', action='store_true', dest='as_decimal',
help="Try to parse the sort key as a decimal number. Mutually exclusive with --datetime.")
argp.add_argument('-o', '--output', type=Path, dest='output_file',
help="File path to the destination file. (Default is to append '_sorted' to the filename.)")
argv = argp.parse_args()
xml_doc = argv.input_file
sort_path = argv.sort_xpath
sort_attr = argv.sort_attr
sort_desc = argv.descending
use_text = argv.use_text
as_dt = argv.as_datetime
as_dec = argv.as_decimal
sorted_xml = sort_xml(xml_doc, sort_path, sort_attr, use_text, as_dt, as_dec, sort_desc)
if not hasattr(argv, 'output_file'):
new_filename = xml_doc.stem + '_sorted'
out_file = xml_doc.with_stem(new_filename)
else:
out_file = argv.output_file
out_file.write_text(ET.tostring(sorted_xml.getroot(), encoding='unicode'), encoding='utf-8')
print(f"Output sorted file as `{out_file}`")