What are XML files in Python used for?
XML stands for EXtensible Markup Language. It is a text-based format for structuring and sharing data across networks, between programs and between people. Despite its similarity to HTML which is also based on SGML standards, the XML format adheres to the strict formatting rules.
Furthermore, XML is more predictable and readable making it easier to spot and resolve errors. While virtually all tags used in HTML are predefined, XML tags on the other hand are not, making it even more extensible.
Structure of XML documents
All XML documents must have a root element which is considered as the parent element of all other elements. Most XML elements also contain an optional prolog. However if the prolog element exists within an XML document, then it must be in the first line in the XML document.
The XML prolog mat is used to specify a character encoding for XML documents which are often UTF-8, versioning and other international characters.
In addition, XML tags must have their respective closing tags. An XML document is considered invalid if some or all of its elements do not have their respective closing tags. Therefore, it is illegal to omit a closing tag in an XML document. On the other hand, the Prolog tag is not considered as part of the XML document and is, therefore, an exception to this rule.
</book>
Tags in XML documents are case sensitive, cases used in opening tags must match those of their counterparts, the closing tags. Tags containing mismatching cases are considered invalid and thus rendering the entire document invalid as well. Below is an illustration of valid and invalid XML tags.
XML elements can also have attributes with their corresponding attribute values, in such instances attribute values should always be kept in quotes. In the example below the book element has an attribute id ==“bk101”.
Parsing XML files in Python using the ElementTree Library
The sample XML document below contains information about books with <catalogue> </catalogue> as the root element of the document.
There is a prolog at the beginning of the document specifying the version of the document; <?xml version=“1.0”?>. The subsequent <book id=“bk101”> </book> elements containing the id attribute as well as child elements such as <author>, <title>, <genre>,<price>,<publish_date> and <description> with their corresponding closing tags.
ElementaryTree is a built-in Python library that we can use to load and manipulate XML files using its range of functions. Navigating through an XML file is often a simple process owing to its intuitive structure.
Since the ElementaryTree is already provided for in the Python standard library we simply need to import it at the top of our program. Apart from the ElementaryTree library Python also provides for us the BeautifulSoup that we can use to parse XML files as well.
Importing the ElementaryTree as an alias is a common practice that allows us to easily call its functions without having the need to type in the entire name of the library every other time. Now to load the XML file we simply need to specify the name of the XML file within the function ET.parse() and initialize it with the tree variable as shown in the code below.
Parsing XML files in Python with a for loop
Using a for loop we can iterate through each of the child elements of the XML document. We can also access elements with attributes and print them out. In the code below we are using a simple for loop to print out the attribute of every book. The attribute referred to in this case is the ‘id’ attribute.
We can go further into the tree and print the sub-elements of the <book> element such as the author. Instead of printing the book attribute, we will initialize the name of the sub-element that we want to access with the values returned by the root.findall() method.
Basically, this method allows us to go deeper into the root element <book> and find the element whose name we have specified within the parentheses. In the code snippet below we are accessing <title> </title>, which is the first sub-element under the book element.
Parse XML files in Python with the findall() method
So the findall() method enables us to access the first layer, we can also access other child elements in a similar manner. In the example below we are accessing both title and price elements at the same time.
This method returns the title of every single book in all the <book> elements printed alongside their corresponding price in the terminal. We can now do anything with these variables since all the information is saved there.
Alternatively, we can also use the root.iter() method to iterate through all the elements under the <book> </book> elements. This method is more precise when accessing a single element. For instance, we can access the author element as shown below.
Summary
This is how to parse XML files in Python. If you’d like to see more programming tutorials, check out our Youtube channel, where we have plenty of Python video tutorials in English.
In our Python Programming Tutorials series, you’ll find useful materials which will help you improve your programming skills and speed up the learning process.
Programming tutorials
- How to use the Python for loop
- How to use Python Sets
- How to use a Python Dictionary
- How to use Python Classes
- How to use Python Range
- How to use Python if-else statements
- How to use Python RegEx
- How to use Python Lists
- How to use Python Enumerate
- How to use Python Functions
- How to use Python Split
- How to use Python Try-Except
- How to use Python Tuples
- How to use Python Arrays
- How to use Python Sort
- How to use the Python DateTime
- How to download Python?
- How to use the Python FileWrite function
- How to use Python Lambda
- How to use Python ListAppend
- How to use Python ListComprehension
- How to use Python Map
- How to use Python Operators
- How to use Python Pandas
- How to use Python Requests
- How to use Python Strings
- How to use Python Count
- How to use Python Comments
- How to use the Python File Reader method
- How to use the Python IDE-s
- How to use Python logging
- How to use Python Print
- How to use the Python Zip
- How to use Python Append
- How to use Python Global Variables
- How to use the Python join method
- How to use Python list length
- How to use Python JSON files
- How to use Python Modulo
- How to use Python file opening methods
- How to use Python round
- How to use Python sleep
- How to use Python replace
- How to use Python strip
- How to use the Python Time module
- How to use Python unittests
- How to save data to a text file using Context Manager?
- How to use Python external modules
- How to use Python find
- How to install the Python pip package manager
- How to delete files in Python
- Parsing XML files in Python
- How to make a GUI in Python
- How to use Python in Command Prompt
- How to Run a Python Program in VS Code
- How to run a program in Python IDLE
- How to run a program in Jupyter Notebook
- How to read a text file in Python
- How to add numbers in Python
- How to ask for user input in Python
- How to debug in Python
- How to create a thread in Python
- How to import a library in Python
- How to use the PIP package manager
- How to use classes in Python
- How to reverse strings in Python
- How to convert a string to int in Python
- How to print on the same line in Python
- How to remove items from a list
- How to add to a dictionary in Python
- How to raise an exception in Python
- How to throw an exception in Python
- How to stop a program in Python
- How to use Python assert
- How to use the Python compiler
Would you like to learn how to code, online? Come and try our first 25 lessons for free at the CodeBerry Programming School.
Learn to code and change your career!

100% ONLINE

IDEAL FOR BEGINNERS

SUPPORTIVE COMMUNITY

SELF-PACED LEARNING
Not sure if programming is for you? With CodeBerry you’ll like it.
