Adding upstream version 0.1.0.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
550be5da31
commit
638d6148c0
12 changed files with 539 additions and 0 deletions
28
LICENSE
Normal file
28
LICENSE
Normal file
|
@ -0,0 +1,28 @@
|
||||||
|
Copyright (c) 2022, Chris Koch <kopachris@gmail.com>
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
(1) Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
(2) Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in
|
||||||
|
the documentation and/or other materials provided with the
|
||||||
|
distribution.
|
||||||
|
|
||||||
|
(3)The name of the author may not be used to
|
||||||
|
endorse or promote products derived from this software without
|
||||||
|
specific prior written permission.
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
|
||||||
|
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||||
|
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||||
|
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
|
||||||
|
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
||||||
|
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||||
|
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||||||
|
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
|
||||||
|
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
|
||||||
|
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||||
|
POSSIBILITY OF SUCH DAMAGE.
|
72
PKG-INFO
Normal file
72
PKG-INFO
Normal file
|
@ -0,0 +1,72 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: sortxml
|
||||||
|
Version: 0.1.0
|
||||||
|
Summary: A simple XML element sorter
|
||||||
|
Home-page: https://www.github.com/kopachris/sortxml
|
||||||
|
Author: Chris Koch
|
||||||
|
Author-email: kopachris@gmail.com
|
||||||
|
License: BSD 3-Clause License
|
||||||
|
Keywords: xml,sort
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: License :: OSI Approved :: BSD License
|
||||||
|
Classifier: Topic :: Text Processing :: Markup :: XML
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
License-File: LICENSE
|
||||||
|
|
||||||
|
# sortxml - a simple XML element sorter
|
||||||
|
|
||||||
|
This module can be used by importing `sortxml.sort_xml` or by running standalone from the command-line.
|
||||||
|
|
||||||
|
## Using `sort_xml()`:
|
||||||
|
|
||||||
|
Returns an ElementTree representing the resulting whole document. ElementTree can easily be converted to string or written to a file like so:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> foo_str = ET.tostring(sort_xml(xml_doc, node_path, sort_attr).getroot())
|
||||||
|
# Or...
|
||||||
|
>>> sort_xml(xml_doc, node_path, sort_attr).write('foo.xml')
|
||||||
|
```
|
||||||
|
|
||||||
|
### Required arguments:
|
||||||
|
* `xml_doc` -- a text IO stream (such as an open file object), Path object pointing to an XML
|
||||||
|
file, string representing the file path, or string containing the file contents of a valid XML file. Can't take
|
||||||
|
an ElementTree instance because we need to use our own parser to keep track of namespaces.
|
||||||
|
* `node_path` -- a string containing the path to the node you want to sort the children of in the XPath language
|
||||||
|
of the etree module
|
||||||
|
* `sort_attr` -- the attribute of the child elements to use as the sort key
|
||||||
|
|
||||||
|
### Optional arguments:
|
||||||
|
* `use_text` -- use `sort_attr` as the name of a subelement of the path's children whose text will be the
|
||||||
|
sort key (default: False)
|
||||||
|
* `sort_as_datetime` -- try to parse the values of the sort key as a datetime using the `dateutil` module and sort
|
||||||
|
chronologically (default: False, mutually exclusive with `sort_as_decimal`)
|
||||||
|
* `sort_as_decimal` -- try to parse the values of the sort key as a decimal and sort numerically (useful to keep
|
||||||
|
'10' from showing up right after '1') (default: False, mutually exclusive with `sort_as_datetime`)
|
||||||
|
* `descending` -- sort in descending order instead of ascending (default: False)
|
||||||
|
|
||||||
|
## Usage on the command line:
|
||||||
|
|
||||||
|
Run `python -m sortxml -h` to display this help text.
|
||||||
|
|
||||||
|
Usage: sortxml [-h] [-v] [-r] [-t] [--datetime | --decimal] [-o OUTPUT_FILE] input_file sort_xpath sort_attr
|
||||||
|
|
||||||
|
A simple XML element sorter. Will sort the children of selected elements using a given attribute's value or subelement's text as the sort key.
|
||||||
|
Example usage:
|
||||||
|
|
||||||
|
$ python sortxml.py ARForm_orig.rdl "./DataSets/DataSet[@Name='ARForm']/Fields" Name -o ARForm.rdl
|
||||||
|
|
||||||
|
### Positional arguments:
|
||||||
|
* _**input_file**_ – File path to the source xml file.
|
||||||
|
* _**sort_xpath**_ – XPath-style selector for elements to sort the children of. This has the same limitations as Python's ElementTree module.
|
||||||
|
* _**sort_attr**_ – The name of the attribute to use as the sort key.
|
||||||
|
|
||||||
|
### Options:
|
||||||
|
* _**-h, --help**_ – show this help message and exit
|
||||||
|
* _**-v, --version**_ – show program's version number and exit
|
||||||
|
* _**-r, --reverse, --descending**_ – Sort the child elements in reverse (descending) order.
|
||||||
|
* _**-t, --text, --use-text**_ – Treat the sort attribute name as the name of a subelement whose text is the sort key.
|
||||||
|
* _**--datetime, --as-datetime**_ – Try to parse the sort key as a date/time value. Mutually exclusive with --decimal.
|
||||||
|
* _**--decimal, --as-decimal**_ – Try to parse the sort key as a decimal number. Mutually exclusive with --datetime.
|
||||||
|
* _**-o OUTPUT_FILE, --output OUTPUT_FILE**_ – File path to the destination file. (Default is to append '_sorted' to the filename before the extension.)
|
||||||
|
|
||||||
|
|
57
README.md
Normal file
57
README.md
Normal file
|
@ -0,0 +1,57 @@
|
||||||
|
# sortxml - a simple XML element sorter
|
||||||
|
|
||||||
|
This module can be used by importing `sortxml.sort_xml` or by running standalone from the command-line.
|
||||||
|
|
||||||
|
## Using `sort_xml()`:
|
||||||
|
|
||||||
|
Returns an ElementTree representing the resulting whole document. ElementTree can easily be converted to string or written to a file like so:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> foo_str = ET.tostring(sort_xml(xml_doc, node_path, sort_attr).getroot())
|
||||||
|
# Or...
|
||||||
|
>>> sort_xml(xml_doc, node_path, sort_attr).write('foo.xml')
|
||||||
|
```
|
||||||
|
|
||||||
|
### Required arguments:
|
||||||
|
* `xml_doc` -- a text IO stream (such as an open file object), Path object pointing to an XML
|
||||||
|
file, string representing the file path, or string containing the file contents of a valid XML file. Can't take
|
||||||
|
an ElementTree instance because we need to use our own parser to keep track of namespaces.
|
||||||
|
* `node_path` -- a string containing the path to the node you want to sort the children of in the XPath language
|
||||||
|
of the etree module
|
||||||
|
* `sort_attr` -- the attribute of the child elements to use as the sort key
|
||||||
|
|
||||||
|
### Optional arguments:
|
||||||
|
* `use_text` -- use `sort_attr` as the name of a subelement of the path's children whose text will be the
|
||||||
|
sort key (default: False)
|
||||||
|
* `sort_as_datetime` -- try to parse the values of the sort key as a datetime using the `dateutil` module and sort
|
||||||
|
chronologically (default: False, mutually exclusive with `sort_as_decimal`)
|
||||||
|
* `sort_as_decimal` -- try to parse the values of the sort key as a decimal and sort numerically (useful to keep
|
||||||
|
'10' from showing up right after '1') (default: False, mutually exclusive with `sort_as_datetime`)
|
||||||
|
* `descending` -- sort in descending order instead of ascending (default: False)
|
||||||
|
|
||||||
|
## Usage on the command line:
|
||||||
|
|
||||||
|
Run `python -m sortxml -h` to display this help text.
|
||||||
|
|
||||||
|
Usage: sortxml [-h] [-v] [-r] [-t] [--datetime | --decimal] [-o OUTPUT_FILE] input_file sort_xpath sort_attr
|
||||||
|
|
||||||
|
A simple XML element sorter. Will sort the children of selected elements using a given attribute's value or subelement's text as the sort key.
|
||||||
|
Example usage:
|
||||||
|
|
||||||
|
$ python sortxml.py ARForm_orig.rdl "./DataSets/DataSet[@Name='ARForm']/Fields" Name -o ARForm.rdl
|
||||||
|
|
||||||
|
### Positional arguments:
|
||||||
|
* _**input_file**_ – File path to the source xml file.
|
||||||
|
* _**sort_xpath**_ – XPath-style selector for elements to sort the children of. This has the same limitations as Python's ElementTree module.
|
||||||
|
* _**sort_attr**_ – The name of the attribute to use as the sort key.
|
||||||
|
|
||||||
|
### Options:
|
||||||
|
* _**-h, --help**_ – show this help message and exit
|
||||||
|
* _**-v, --version**_ – show program's version number and exit
|
||||||
|
* _**-r, --reverse, --descending**_ – Sort the child elements in reverse (descending) order.
|
||||||
|
* _**-t, --text, --use-text**_ – Treat the sort attribute name as the name of a subelement whose text is the sort key.
|
||||||
|
* _**--datetime, --as-datetime**_ – Try to parse the sort key as a date/time value. Mutually exclusive with --decimal.
|
||||||
|
* _**--decimal, --as-decimal**_ – Try to parse the sort key as a decimal number. Mutually exclusive with --datetime.
|
||||||
|
* _**-o OUTPUT_FILE, --output OUTPUT_FILE**_ – File path to the destination file. (Default is to append '_sorted' to the filename before the extension.)
|
||||||
|
|
||||||
|
|
3
pyproject.toml
Normal file
3
pyproject.toml
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
[build-system]
|
||||||
|
requires = ["setuptools"]
|
||||||
|
build-backend = "setuptools.build_meta"
|
28
setup.cfg
Normal file
28
setup.cfg
Normal file
|
@ -0,0 +1,28 @@
|
||||||
|
[metadata]
|
||||||
|
name = sortxml
|
||||||
|
version = attr: sortxml.__version_str__
|
||||||
|
description = A simple XML element sorter
|
||||||
|
long_description = file: README.md
|
||||||
|
keywords = xml, sort
|
||||||
|
license = BSD 3-Clause License
|
||||||
|
license_files = LICENSE
|
||||||
|
url = https://www.github.com/kopachris/sortxml
|
||||||
|
author = Chris Koch
|
||||||
|
author_email = kopachris@gmail.com
|
||||||
|
classifiers =
|
||||||
|
Programming Language :: Python :: 3
|
||||||
|
License :: OSI Approved :: BSD License
|
||||||
|
Topic :: Text Processing :: Markup :: XML
|
||||||
|
Topic :: Utilities
|
||||||
|
|
||||||
|
[options]
|
||||||
|
py_modules = sortxml
|
||||||
|
install_requires =
|
||||||
|
python-dateutil
|
||||||
|
setup_requires =
|
||||||
|
python-dateutil
|
||||||
|
|
||||||
|
[egg_info]
|
||||||
|
tag_build =
|
||||||
|
tag_date = 0
|
||||||
|
|
3
setup.py
Normal file
3
setup.py
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
from setuptools import setup
|
||||||
|
|
||||||
|
setup()
|
72
sortxml.egg-info/PKG-INFO
Normal file
72
sortxml.egg-info/PKG-INFO
Normal file
|
@ -0,0 +1,72 @@
|
||||||
|
Metadata-Version: 2.1
|
||||||
|
Name: sortxml
|
||||||
|
Version: 0.1.0
|
||||||
|
Summary: A simple XML element sorter
|
||||||
|
Home-page: https://www.github.com/kopachris/sortxml
|
||||||
|
Author: Chris Koch
|
||||||
|
Author-email: kopachris@gmail.com
|
||||||
|
License: BSD 3-Clause License
|
||||||
|
Keywords: xml,sort
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: License :: OSI Approved :: BSD License
|
||||||
|
Classifier: Topic :: Text Processing :: Markup :: XML
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
License-File: LICENSE
|
||||||
|
|
||||||
|
# sortxml - a simple XML element sorter
|
||||||
|
|
||||||
|
This module can be used by importing `sortxml.sort_xml` or by running standalone from the command-line.
|
||||||
|
|
||||||
|
## Using `sort_xml()`:
|
||||||
|
|
||||||
|
Returns an ElementTree representing the resulting whole document. ElementTree can easily be converted to string or written to a file like so:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> foo_str = ET.tostring(sort_xml(xml_doc, node_path, sort_attr).getroot())
|
||||||
|
# Or...
|
||||||
|
>>> sort_xml(xml_doc, node_path, sort_attr).write('foo.xml')
|
||||||
|
```
|
||||||
|
|
||||||
|
### Required arguments:
|
||||||
|
* `xml_doc` -- a text IO stream (such as an open file object), Path object pointing to an XML
|
||||||
|
file, string representing the file path, or string containing the file contents of a valid XML file. Can't take
|
||||||
|
an ElementTree instance because we need to use our own parser to keep track of namespaces.
|
||||||
|
* `node_path` -- a string containing the path to the node you want to sort the children of in the XPath language
|
||||||
|
of the etree module
|
||||||
|
* `sort_attr` -- the attribute of the child elements to use as the sort key
|
||||||
|
|
||||||
|
### Optional arguments:
|
||||||
|
* `use_text` -- use `sort_attr` as the name of a subelement of the path's children whose text will be the
|
||||||
|
sort key (default: False)
|
||||||
|
* `sort_as_datetime` -- try to parse the values of the sort key as a datetime using the `dateutil` module and sort
|
||||||
|
chronologically (default: False, mutually exclusive with `sort_as_decimal`)
|
||||||
|
* `sort_as_decimal` -- try to parse the values of the sort key as a decimal and sort numerically (useful to keep
|
||||||
|
'10' from showing up right after '1') (default: False, mutually exclusive with `sort_as_datetime`)
|
||||||
|
* `descending` -- sort in descending order instead of ascending (default: False)
|
||||||
|
|
||||||
|
## Usage on the command line:
|
||||||
|
|
||||||
|
Run `python -m sortxml -h` to display this help text.
|
||||||
|
|
||||||
|
Usage: sortxml [-h] [-v] [-r] [-t] [--datetime | --decimal] [-o OUTPUT_FILE] input_file sort_xpath sort_attr
|
||||||
|
|
||||||
|
A simple XML element sorter. Will sort the children of selected elements using a given attribute's value or subelement's text as the sort key.
|
||||||
|
Example usage:
|
||||||
|
|
||||||
|
$ python sortxml.py ARForm_orig.rdl "./DataSets/DataSet[@Name='ARForm']/Fields" Name -o ARForm.rdl
|
||||||
|
|
||||||
|
### Positional arguments:
|
||||||
|
* _**input_file**_ – File path to the source xml file.
|
||||||
|
* _**sort_xpath**_ – XPath-style selector for elements to sort the children of. This has the same limitations as Python's ElementTree module.
|
||||||
|
* _**sort_attr**_ – The name of the attribute to use as the sort key.
|
||||||
|
|
||||||
|
### Options:
|
||||||
|
* _**-h, --help**_ – show this help message and exit
|
||||||
|
* _**-v, --version**_ – show program's version number and exit
|
||||||
|
* _**-r, --reverse, --descending**_ – Sort the child elements in reverse (descending) order.
|
||||||
|
* _**-t, --text, --use-text**_ – Treat the sort attribute name as the name of a subelement whose text is the sort key.
|
||||||
|
* _**--datetime, --as-datetime**_ – Try to parse the sort key as a date/time value. Mutually exclusive with --decimal.
|
||||||
|
* _**--decimal, --as-decimal**_ – Try to parse the sort key as a decimal number. Mutually exclusive with --datetime.
|
||||||
|
* _**-o OUTPUT_FILE, --output OUTPUT_FILE**_ – File path to the destination file. (Default is to append '_sorted' to the filename before the extension.)
|
||||||
|
|
||||||
|
|
11
sortxml.egg-info/SOURCES.txt
Normal file
11
sortxml.egg-info/SOURCES.txt
Normal file
|
@ -0,0 +1,11 @@
|
||||||
|
LICENSE
|
||||||
|
README.md
|
||||||
|
pyproject.toml
|
||||||
|
setup.cfg
|
||||||
|
setup.py
|
||||||
|
sortxml.py
|
||||||
|
sortxml.egg-info/PKG-INFO
|
||||||
|
sortxml.egg-info/SOURCES.txt
|
||||||
|
sortxml.egg-info/dependency_links.txt
|
||||||
|
sortxml.egg-info/requires.txt
|
||||||
|
sortxml.egg-info/top_level.txt
|
1
sortxml.egg-info/dependency_links.txt
Normal file
1
sortxml.egg-info/dependency_links.txt
Normal file
|
@ -0,0 +1 @@
|
||||||
|
|
1
sortxml.egg-info/requires.txt
Normal file
1
sortxml.egg-info/requires.txt
Normal file
|
@ -0,0 +1 @@
|
||||||
|
python-dateutil
|
1
sortxml.egg-info/top_level.txt
Normal file
1
sortxml.egg-info/top_level.txt
Normal file
|
@ -0,0 +1 @@
|
||||||
|
sortxml
|
262
sortxml.py
Normal file
262
sortxml.py
Normal file
|
@ -0,0 +1,262 @@
|
||||||
|
#!/usr/bin/python310
|
||||||
|
|
||||||
|
"""Simple XML element sorter.
|
||||||
|
|
||||||
|
This module can be used by importing `sort_xml` or by running standalone from the command-line.
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Copyright (c) 2022, Chris Koch <kopachris@gmail.com>
|
||||||
|
# Redistribution and use in source and binary forms, with or without
|
||||||
|
# modification, are permitted provided that the following conditions are
|
||||||
|
# met:
|
||||||
|
#
|
||||||
|
# (1) Redistributions of source code must retain the above copyright
|
||||||
|
# notice, this list of conditions and the following disclaimer.
|
||||||
|
#
|
||||||
|
# (2) Redistributions in binary form must reproduce the above copyright
|
||||||
|
# notice, this list of conditions and the following disclaimer in
|
||||||
|
# the documentation and/or other materials provided with the
|
||||||
|
# distribution.
|
||||||
|
#
|
||||||
|
# (3)The name of the author may not be used to
|
||||||
|
# endorse or promote products derived from this software without
|
||||||
|
# specific prior written permission.
|
||||||
|
#
|
||||||
|
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
|
||||||
|
# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||||
|
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||||
|
# DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
|
||||||
|
# INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
||||||
|
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||||
|
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||||||
|
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
|
||||||
|
# STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
|
||||||
|
# IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||||
|
# POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
|
||||||
|
__version__ = (0, 1, 0)
|
||||||
|
__version_str__ = '.'.join([str(v) for v in __version__])
|
||||||
|
|
||||||
|
__description__ = """
|
||||||
|
A simple XML element sorter. Will sort the children of selected elements
|
||||||
|
using a given attribute's value or subelement's text as the sort key.
|
||||||
|
Example usage:
|
||||||
|
$ python sortxml.py ARForm_orig.rdl "./DataSets/DataSet[@Name='ARForm']/Fields" Name -o ARForm.rdl
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse as ap
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
from pathlib import Path
|
||||||
|
from io import TextIOWrapper
|
||||||
|
from codecs import BOM_UTF8
|
||||||
|
from decimal import Decimal
|
||||||
|
from dateutil.parser import parse as parse_dt
|
||||||
|
|
||||||
|
|
||||||
|
class NSElement(ET.Element):
|
||||||
|
"""Subclass of ElementTree.Element which keeps track of its TreeBuilder and namespaces if available."""
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
self._ns_map = dict()
|
||||||
|
self._builder = None
|
||||||
|
if 'builder' in kwargs:
|
||||||
|
builder = kwargs.pop('builder')
|
||||||
|
self._builder = builder
|
||||||
|
if hasattr(builder, 'ns_map'):
|
||||||
|
self._ns_map = builder.ns_map
|
||||||
|
super().__init__(*args, **kwargs)
|
||||||
|
|
||||||
|
def find(self, path, namespaces=None):
|
||||||
|
if namespaces is None:
|
||||||
|
namespaces = self._ns_map
|
||||||
|
return super().find(path, namespaces)
|
||||||
|
|
||||||
|
def findall(self, path, namespaces=None):
|
||||||
|
if namespaces is None:
|
||||||
|
namespaces = self._ns_map
|
||||||
|
return super().findall(path, namespaces)
|
||||||
|
|
||||||
|
def findtext(self, path, default=None, namespaces=None):
|
||||||
|
if namespaces is None:
|
||||||
|
namespaces = self._ns_map
|
||||||
|
return super().findtext(path, default, namespaces)
|
||||||
|
|
||||||
|
def iterfind(self, path, namespaces=None):
|
||||||
|
if namespaces is None:
|
||||||
|
namespaces = self._ns_map
|
||||||
|
return super().iterfind(path, namespaces)
|
||||||
|
|
||||||
|
|
||||||
|
class NSTreeBuilder(ET.TreeBuilder):
|
||||||
|
"""Subclass of ElementTree.TreeBuilder which adds namespaces in the document to the namespace registry."""
|
||||||
|
|
||||||
|
def __init__(self, **kwargs):
|
||||||
|
self.ns_map = dict()
|
||||||
|
if 'element_factory' in kwargs:
|
||||||
|
del kwargs['element_factory']
|
||||||
|
super().__init__(element_factory=NSElement, **kwargs)
|
||||||
|
|
||||||
|
def start_ns(self, prefix, uri):
|
||||||
|
self.ns_map[prefix] = uri
|
||||||
|
ET.register_namespace(prefix, uri)
|
||||||
|
|
||||||
|
def start(self, tag, attrs):
|
||||||
|
if self._factory is NSElement:
|
||||||
|
self._flush()
|
||||||
|
self._last = e = self._factory(tag, attrs, builder=self)
|
||||||
|
if self._elem:
|
||||||
|
self._elem[-1].append(e)
|
||||||
|
elif self._root is None:
|
||||||
|
self._root = e
|
||||||
|
self._elem.append(e)
|
||||||
|
self._tail = 0
|
||||||
|
return e
|
||||||
|
else:
|
||||||
|
return super().start(tag, attrs)
|
||||||
|
|
||||||
|
def _handle_single(self, factory, insert, *args):
|
||||||
|
if factory is NSElement:
|
||||||
|
e = factory(*args, builder=self)
|
||||||
|
if insert:
|
||||||
|
self._flush()
|
||||||
|
self._last = e
|
||||||
|
if self._elem:
|
||||||
|
self._elem[-1].append(e)
|
||||||
|
self._tail = 1
|
||||||
|
return e
|
||||||
|
else:
|
||||||
|
return super()._handle_single(factory, insert, *args)
|
||||||
|
|
||||||
|
|
||||||
|
def sort_xml(xml_doc, node_path, sort_attr, use_text=False, sort_as_datetime=False, sort_as_decimal=False,
|
||||||
|
descending=False):
|
||||||
|
"""Sort the children of a selection of elements in an XML document. Returns an ElementTree representing the
|
||||||
|
resulting whole document. ElementTree can easily be converted to string or written to a file like so:
|
||||||
|
|
||||||
|
>>> foo_str = ET.tostring(sort_xml(xml_doc, node_path, sort_attr).getroot())
|
||||||
|
>>> sort_xml(xml_doc, node_path, sort_attr).write('foo.xml')
|
||||||
|
|
||||||
|
Required arguments:
|
||||||
|
-------------------
|
||||||
|
* `xml_doc` -- a text IO stream (such as an open file object), Path object pointing to an XML
|
||||||
|
file, string representing the file path, or string containing the file contents of a valid XML file. Can't take
|
||||||
|
an ElementTree instance because we need to use our own parser to keep track of namespaces.
|
||||||
|
* `node_path` -- a string containing the path to the node you want to sort the children of in the XPath language
|
||||||
|
of the etree module
|
||||||
|
* `sort_attr` -- the attribute of the child elements to use as the sort key
|
||||||
|
|
||||||
|
Optional arguments:
|
||||||
|
-------------------
|
||||||
|
* `use_text` -- use `sort_attr` as the name of a subelement of the path's children whose text will be the
|
||||||
|
sort key (default: False)
|
||||||
|
* `sort_as_datetime` -- try to parse the values of the sort key as a datetime using the `dateutil` module and sort
|
||||||
|
chronologically (default: False, mutually exclusive with `sort_as_decimal`)
|
||||||
|
* `sort_as_decimal` -- try to parse the values of the sort key as a decimal and sort numerically (useful to keep
|
||||||
|
'10' from showing up right after '1') (default: False, mutually exclusive with `sort_as_datetime`)
|
||||||
|
* `descending` -- sort in descending order instead of ascending (default: False)
|
||||||
|
|
||||||
|
"""
|
||||||
|
# check parameters
|
||||||
|
|
||||||
|
# xml_doc
|
||||||
|
if isinstance(xml_doc, TextIOWrapper) and xml_doc.readable():
|
||||||
|
# xml_doc is a readable text stream, let's read it
|
||||||
|
# but first make sure to remove any byte order marker
|
||||||
|
|
||||||
|
if xml_doc.encoding != 'utf-8-sig':
|
||||||
|
xml_doc.reconfigure(encoding='utf-8-sig')
|
||||||
|
|
||||||
|
xml_str = xml_doc.read()
|
||||||
|
elif isinstance(xml_doc, Path) and xml_doc.is_file():
|
||||||
|
# xml_doc is a Path object to a file
|
||||||
|
xml_str = xml_doc.read_text('utf-8-sig') # utf-8-sig to remove byte order marker
|
||||||
|
elif isinstance(xml_doc, str) and Path(xml_doc).is_file():
|
||||||
|
# xml_doc is a filename
|
||||||
|
xml_str = Path(xml_doc).read_text('utf-8-sig')
|
||||||
|
elif isinstance(xml_doc, str) and len(xml_doc) > 0:
|
||||||
|
# xml_doc hopefully contains valid XML
|
||||||
|
if xml_doc.startswith(BOM_UTF8.decode('utf-8')):
|
||||||
|
xml_str = xml_doc[3:]
|
||||||
|
else:
|
||||||
|
xml_str = xml_doc
|
||||||
|
else:
|
||||||
|
raise TypeError("sort_xml() requires first parameter must be a string, readable IO stream, or path for a "
|
||||||
|
f"valid xml file! xml_doc: {repr(xml_doc)}")
|
||||||
|
|
||||||
|
# sort_attr
|
||||||
|
if not (isinstance(sort_attr, str) and len(sort_attr) > 0):
|
||||||
|
raise TypeError("sort_xml() requires sort attribute must be a non-empty string!\n\t"
|
||||||
|
f"sort_attr: {repr(sort_attr)}")
|
||||||
|
else:
|
||||||
|
sort_attr = sort_attr.strip()
|
||||||
|
if not (sort_attr.replace('_', '').isalnum() and (sort_attr[0].isalpha() or sort_attr[0] == '_')):
|
||||||
|
raise ValueError("Sort attribute passed to sort_xml() is an invalid name!\n\t"
|
||||||
|
f"sort_attr: {repr(sort_attr)}")
|
||||||
|
|
||||||
|
# make our element tree using our custom treebuilder and get all the parents we have to sort children of
|
||||||
|
|
||||||
|
dom = ET.fromstring(xml_str, ET.XMLParser(target=NSTreeBuilder()))
|
||||||
|
matching_parents = dom.findall(node_path)
|
||||||
|
|
||||||
|
# check what kind of sorting we're doing and do it
|
||||||
|
# TODO might be faster if we do the check once and then run the appropriate for loop?
|
||||||
|
for par in matching_parents:
|
||||||
|
if use_text:
|
||||||
|
if sort_as_datetime:
|
||||||
|
par[:] = sorted(par, key=lambda x: parse_dt(x.findtext(sort_attr)), reverse=descending)
|
||||||
|
elif sort_as_decimal:
|
||||||
|
par[:] = sorted(par, key=lambda x: Decimal(x.findtext(sort_attr)), reverse=descending)
|
||||||
|
else:
|
||||||
|
par[:] = sorted(par, key=lambda x: x.findtext(sort_attr), reverse=descending)
|
||||||
|
elif sort_as_datetime:
|
||||||
|
par[:] = sorted(par, key=lambda x: parse_dt(x.get(sort_attr)), reverse=descending)
|
||||||
|
elif sort_as_decimal:
|
||||||
|
par[:] = sorted(par, key=lambda x: Decimal(x.get(sort_attr)), reverse=descending)
|
||||||
|
else:
|
||||||
|
par[:] = sorted(par, key=lambda x: x.get(sort_attr), reverse=descending)
|
||||||
|
|
||||||
|
return ET.ElementTree(dom)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
argp = ap.ArgumentParser(description=__description__, formatter_class=ap.RawDescriptionHelpFormatter)
|
||||||
|
argp.add_argument('-v', '--version', action='version', version=f"%(prog)s -- version {__version_str__}")
|
||||||
|
argp.add_argument('input_file', type=Path, help="File path to the source xml file.")
|
||||||
|
argp.add_argument('sort_xpath',
|
||||||
|
help="XPath-style selector for elements to sort the children of. This has the same limitations "
|
||||||
|
"as Python's ElementTree module.")
|
||||||
|
argp.add_argument('sort_attr', help="The name of the attribute to use as the sort key.")
|
||||||
|
argp.add_argument('-r', '--reverse', '--descending', action='store_true', dest='descending',
|
||||||
|
help="Sort the child elements in reverse (descending) order.")
|
||||||
|
argp.add_argument('-t', '--text', '--use-text', action='store_true', dest='use_text',
|
||||||
|
help="Treat the sort attribute name as the name of a subelement whose text is the sort key.")
|
||||||
|
sort_style = argp.add_mutually_exclusive_group()
|
||||||
|
sort_style.add_argument('--datetime', '--as-datetime', action='store_true', dest='as_datetime',
|
||||||
|
help="Try to parse the sort key as a date/time value. Mutually exclusive with --decimal.")
|
||||||
|
sort_style.add_argument('--decimal', '--as-decimal', action='store_true', dest='as_decimal',
|
||||||
|
help="Try to parse the sort key as a decimal number. Mutually exclusive with --datetime.")
|
||||||
|
argp.add_argument('-o', '--output', type=Path, dest='output_file',
|
||||||
|
help="File path to the destination file. (Default is to append '_sorted' to the filename.)")
|
||||||
|
|
||||||
|
argv = argp.parse_args()
|
||||||
|
|
||||||
|
xml_doc = argv.input_file
|
||||||
|
sort_path = argv.sort_xpath
|
||||||
|
sort_attr = argv.sort_attr
|
||||||
|
sort_desc = argv.descending
|
||||||
|
use_text = argv.use_text
|
||||||
|
as_dt = argv.as_datetime
|
||||||
|
as_dec = argv.as_decimal
|
||||||
|
|
||||||
|
sorted_xml = sort_xml(xml_doc, sort_path, sort_attr, use_text, as_dt, as_dec, sort_desc)
|
||||||
|
|
||||||
|
if not hasattr(argv, 'output_file'):
|
||||||
|
new_filename = xml_doc.stem + '_sorted'
|
||||||
|
out_file = xml_doc.with_stem(new_filename)
|
||||||
|
else:
|
||||||
|
out_file = argv.output_file
|
||||||
|
|
||||||
|
out_file.write_text(ET.tostring(sorted_xml.getroot(), encoding='unicode'), encoding='utf-8')
|
||||||
|
|
||||||
|
print(f"Output sorted file as `{out_file}`")
|
Loading…
Add table
Add a link
Reference in a new issue