Goodbye os.path: 15 Pathlib Tips to Rapidly Grasp The File System in Python

No complications and unreadable code from `os.path`

Pathlib could also be my favourite library (after Sklearn, clearly). And given there are over 130 thousand libraries, that’s saying one thing. Pathlib helps me flip code like this written in os.path:

import osdir_path = "/dwelling/person/paperwork"
# Discover all textual content recordsdata inside a listing
recordsdata = [os.path.join(dir_path, f) for f in os.listdir(dir_path) 
if os.path.isfile(os.path.join(dir_path, f)) and f.endswith(".txt")]

into this:

from pathlib import Path# Discover all textual content recordsdata inside a listing
recordsdata = checklist(dir_path.glob("*.txt"))

Pathlib got here out in Python 3.4 as a alternative for the nightmare that was os.path. It additionally marked an necessary milestone for Python language on the entire: they lastly turned each single factor into an object (even nothing).

The most important downside of os.path was treating system paths as strings, which led to unreadable, messy code and a steep studying curve.

By representing paths as fully-fledged objects, Pathlib solves all these points and introduces class, consistency, and a breath of recent air into path dealing with.

And this long-overdue article of mine will define among the greatest capabilities/options and tips of pathlib to carry out duties that may have been really horrible experiences in os.path.

Studying these options of Pathlib will make every thing associated to paths and recordsdata simpler for you as a knowledge skilled, particularly throughout knowledge processing workflows the place you must transfer round 1000’s of photos, CSVs, or audio recordsdata.

Let’s get began!

Working with paths

1. Creating paths

Nearly all options of pathlib is accessible by way of its Path class, which you need to use to create paths to recordsdata and directories.

There are a couple of methods you may create paths with Path. First, there are class strategies like cwd and dwelling for the present working and the house person directories:

from pathlib import PathPath.cwd()

PosixPath('/dwelling/bexgboost/articles/2023/4_april/1_pathlib')

Path.dwelling()

PosixPath('/dwelling/bexgboost')

You may as well create paths from string paths:

p = Path("paperwork")p

PosixPath('paperwork')

Becoming a member of paths is a breeze in Pathlib with the ahead slash operator:

data_dir = Path(".") / "knowledge"
csv_file = data_dir / "file.csv"print(data_dir)
print(csv_file)

knowledge
knowledge/file.csv

Please, don’t let anybody ever catch you utilizing os.path.be a part of after this.

To test whether or not a path, you need to use the boolean operate exists:

data_dir.exists()

True

csv_file.exists()

True

Typically, your entire Path object gained’t be seen, and you must test whether or not it’s a listing or a file. So, you need to use is_dir or is_file capabilities to do it:

data_dir.is_dir()

True

csv_file.is_file()

True

Most paths you’re employed with might be relative to your present listing. However, there are instances the place you must present the precise location of a file or a listing to make it accessible from any Python script. That is if you use absolute paths:

csv_file.absolute()

PosixPath('/dwelling/bexgboost/articles/2023/4_april/1_pathlib/knowledge/file.csv')

Lastly, if in case you have the misfortune of working with libraries that also require string paths, you may name str(path):

str(Path.dwelling())

'/dwelling/bexgboost'

Most libraries within the knowledge stack have lengthy supported Path objects, together with sklearn, pandas, matplotlib, seaborn, and so forth.

2. Path attributes

Path objects have many helpful attributes. Let’s see some examples utilizing this path object that factors to a picture file.

image_file = Path("photos/midjourney.png").absolute()image_file

PosixPath('/dwelling/bexgboost/articles/2023/4_april/1_pathlib/photos/midjourney.png')

Let’s begin with the mother or father. It returns a path object that’s one stage up the present working listing.

image_file.mother or father

PosixPath('/dwelling/bexgboost/articles/2023/4_april/1_pathlib/photos')

Typically, it’s your decision solely the file title as a substitute of the entire path. There may be an attribute for that:

image_file.title

'midjourney.png'

which returns solely the file title with the extension.

There may be additionally stem for the file title with out the suffix:

image_file.stem

'midjourney'

Or the suffix itself with the dot for the file extension:

image_file.suffix

'.png'

If you wish to divide a path into its elements, you need to use components as a substitute of str.cut up('/'):

image_file.components

('/',
'dwelling',
'bexgboost',
'articles',
'2023',
'4_april',
'1_pathlib',
'photos',
'midjourney.png')

In order for you these elements to be Path objects in themselves, you need to use dad and mom attribute, which creates a generator:

for i in image_file.dad and mom:
print(i)

/dwelling/bexgboost/articles/2023/4_april/1_pathlib/photos
/dwelling/bexgboost/articles/2023/4_april/1_pathlib
/dwelling/bexgboost/articles/2023/4_april
/dwelling/bexgboost/articles/2023
/dwelling/bexgboost/articles
/dwelling/bexgboost
/dwelling
/

Working with recordsdata

bexgboost_classified_files._8k._sharp_quality._ed73fcdc-67e6-4b3c-ace4-3092b268cc42.png — Categorized recordsdata. — Midjourney

To create recordsdata and write to them, you don’t have to make use of open operate anymore. Simply create a Path object and write_text or write_btyes to them:

markdown = data_dir / "file.md"# Create (override) and write textual content
markdown.write_text("# It is a take a look at markdown")

Or, if you have already got a file, you may read_text or read_bytes:

markdown.read_text()

'# It is a take a look at markdown'

len(image_file.read_bytes())

Nonetheless, observe that write_text or write_bytes overrides current contents of a file.

# Write new textual content to current file
markdown.write_text("## It is a new line")

# The file is overridden
markdown.read_text()

'## It is a new line'

To append new data to current recordsdata, it is best to use open technique of Path objects in a (append) mode:

# Append textual content
with markdown.open(mode="a") as file:
file.write("n### That is the second line")markdown.read_text()

'## It is a new linen### That is the second line'

It is usually frequent to rename recordsdata. rename technique accepts the vacation spot path for the renamed file.

To create the vacation spot path within the present listing, i. e. rename the file, you need to use with_stem on the present path, which replaces the stem of the unique file:

renamed_md = markdown.with_stem("new_markdown")markdown.rename(renamed_md)

PosixPath('knowledge/new_markdown.md')

Above, file.md is became new_markdown.md.

Let’s see the file dimension by way of stat().st_size:

# Show file dimension
renamed_md.stat().st_size

49 # in bytes

or the final time the file was modified, which was a couple of seconds in the past:

from datetime import datetimemodified_timestamp = renamed_md.stat().st_mtime
datetime.fromtimestamp(modified_timestamp)

datetime.datetime(2023, 4, 3, 13, 32, 45, 542693)

st_mtime returns a timestamp, which is the rely of seconds since January 1, 1970. To make it readable, you need to use use the fromtimestamp operate of datatime.

To take away undesirable recordsdata, you may unlink them:

renamed_md.unlink(missing_ok=True)

Setting missing_ok to True gained’t increase any alarms if the file doesn’t exist.

Working with directories

There are a couple of neat tips to work with directories in Pathlib. First, let’s see the way to create directories recursively.

new_dir = (
Path.cwd()
/ "new_dir"
/ "child_dir"
/ "grandchild_dir"
)new_dir.exists()

False

The new_dir doesn’t exist, so let’s create it with all its kids:

new_dir.mkdir(dad and mom=True, exist_ok=True)

By default, mkdir creates the final youngster of the given path. If the intermediate dad and mom don’t exist, you must set dad and mom to True.

To take away empty directories, you need to use rmdir. If the given path object is nested, solely the final youngster listing is deleted:

# Removes the final youngster listing
new_dir.rmdir()

To checklist the contents of a listing like ls on the terminal, you need to use iterdir. Once more, the outcome might be a generator object, yielding listing contents as separate path objects one after the other:

for p in Path.dwelling().iterdir():
print(p)

/dwelling/bexgboost/.python_history
/dwelling/bexgboost/word_counter.py
/dwelling/bexgboost/.azure
/dwelling/bexgboost/.npm
/dwelling/bexgboost/.nv
/dwelling/bexgboost/.julia
...

To seize all recordsdata with a selected extension or a reputation sample, you need to use the glob operate with a daily expression.

For instance, beneath, we’ll discover all textual content recordsdata inside my dwelling listing with glob("*.txt"):

dwelling = Path.dwelling()
text_files = checklist(dwelling.glob("*.txt"))len(text_files)

3 # Solely three

To seek for textual content recordsdata recursively, that means inside all youngster directories as effectively, you need to use recursive glob with rglob:

all_text_files = [p for p in home.rglob("*.txt")]len(all_text_files)

5116 # Now way more

Find out about common expressions right here.

You may as well use rglob('*') to checklist listing contents recursively. It’s just like the supercharged model of iterdir().

One of many use instances of that is counting the variety of file codecs that seem inside a listing.

To do that, we import the Counter class from collections and supply all file suffixes to it inside the articles folder of dwelling:

from collections import Counterfile_counts = Counter(
path.suffix for path in (dwelling / "articles").rglob("*")
)
file_counts

Counter({'.py': 12,
'': 1293,
'.md': 1,
'.txt': 7,
'.ipynb': 222,
'.png': 90,
'.mp4': 39})

Working system variations

Sorry, however now we have to speak about this nightmare of a difficulty.

Up till now, now we have been coping with PosixPath objects, that are the default for UNIX-like techniques:

kind(Path.dwelling())

pathlib.PosixPath

In the event you have been on Home windows, you’ll get a WindowsPath object:

from pathlib import WindowsPath# Consumer uncooked strings that begin with r to write down home windows paths
path = WindowsPath(r"C:customers")
path

NotImplementedError: can't instantiate 'WindowsPath' in your system

Instantiating one other system’s path raises an error just like the above.

However what in the event you have been pressured to work with paths from one other system, like code written by coworkers who use Home windows?

As an answer, pathlib affords pure path objects like PureWindowsPath or PurePosixPath:

from pathlib import PurePosixPath, PureWindowsPathpath = PureWindowsPath(r"C:customers")
path

PureWindowsPath('C:/customers')

These are primitive path objects. You’ve entry to some path strategies and attributes, however basically, the trail object stays a string:

path / "bexgboost"

PureWindowsPath('C:/customers/bexgboost')

path.mother or father

PureWindowsPath('C:/')

path.stem

'customers'

path.rename(r"C:losers") # Unsupported

AttributeError: 'PureWindowsPath' object has no attribute 'rename'

Conclusion

When you have seen, I lied within the title of the article. As an alternative of 15, I consider the rely of latest tips and capabilities was 30ish.

I didn’t wish to scare you off.

However I hope I’ve satisfied you sufficient to ditch os.path and begin utilizing pathlib for a lot simpler and extra readable path operations.

Forge a brand new path, if you’ll 🙂

bexgboost_Paths_and_pathlib._Extreme_quality._76f2bbe4-7c8d-45a6-abf4-ccc8d9e32144.png — Path. — Midjourney

In the event you loved this text and, let’s face it, its weird writing type, take into account supporting me by signing as much as change into a Medium member. Membership prices 4.99$ a month and offers you limitless entry to all my tales and lots of of 1000’s of articles written by extra skilled people. In the event you join by way of this hyperlink, I’ll earn a small fee with no additional price to your pocket.