How To Find The First Index Of Any Of A Set Of Characters In A String
Solution 1:
You can use enumerate and next with a generator expression, getting the first match or returning None if no character appears in s:
s = "Hello world!"
st = {"!"," "}
ind = next((i for i, ch inenumerate(s) if ch in st),None)
print(ind)
You can pass any value you want to next as a default return value if there is no match.
If you want to use a function and raise a ValueError:
deffirst_index(s, characters):
st = set(characters)
ind = next((i for i, ch inenumerate(s) if ch in st), None)
if ind isnotNone:
return ind
raise ValueError
For smaller inputs using a set won't make much if any difference but for large strings it will be a more efficient.
Some timings:
In the string, last character of character set:
In [40]: s = "Hello world!" * 100
In [41]: string = s
In [42]: %%timeit
st = {"x","y","!"}
next((i for i, ch in enumerate(s) if ch in st), None)
....:
1000000 loops, best of 3: 1.71 µs per loop
In [43]: %%timeit
specials = ['x', 'y', '!']
min(map(lambda x: (string.index(x) if (x instring) elselen(string)), specials))
....:
100000 loops, best of 3: 2.64 µs per loop
Not in the string, larger character set:
In [44]: %%timeit
st = {"u","v","w","x","y","z"}
next((i for i, ch inenumerate(s) if ch in st), None)
....:
1000000 loops, best of 3: 1.49 µs per loop
In [45]: %%timeit
specials = ["u","v","w","x","y","z"]
min(map(lambda x: (string.index(x) if (x in string) elselen(string)), specials))
....:
100000 loops, best of 3: 5.48 µs per loop
In string an very first character of character set:
In [47]: %%timeit
specials = ['H', 'y', '!']
min(map(lambda x: (string.index(x) if (x in string) elselen(string)), specials))
....:
100000 loops, best of 3: 2.02 µs per loop
In [48]: %%timeit
st = {"H","y","!"}
next((i for i, ch inenumerate(s) if ch in st), None)
....:
1000000 loops, best of 3: 903 ns per loop
Solution 2:
I would favour the re
module as it's built in and already tested. It's also optimised for exactly this kind of thing.
>>>import re>>>re.search(r'[ !]', 'Hello World!').start()
5
You probably want to check that a match was found though or catch the exception when it's not.
There are reasons for not using re
, but I would want to see a good comment justifying the rational. Thinking that you can "do it better" is generally unnecessary, makes it harder for others to read the code and is less maintainable.
Solution 3:
Use a gen-exp and the find
method.
>>>a = [' ', '!']>>>s = "Hello World!">>>min(s.find(i) for i in a)
5
To remove -1
s if they occur, you can have a filter within the list comp
>>>a = [' ', '!','$']>>>s = "Hello World!">>>min(s.find(i) for i in a if i in s)
5
or you can substitute None
>>> min(s.find(i) if i in s else None for i in a)
5
Adding the timeit
results
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(s.find(i) for i in a if i in s)"1000000 loops, best of 3: 0.902 usec per loop
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';next((i for i, ch in enumerate(s) if ch in a),None)"1000000 loops, best of 3: 1.25 usec per loop
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(map(lambda x: (s.index(x) if (x in s) else len(s)), a))"1000000 loops, best of 3: 1.12 usec per loop
In your Example case, Padraic's beautiful solution is a little slow. However in large test cases, It is definitely a winner. (It's a little surprising that alfasin's "Not as optimized" is also faster here)
Added the implementation details
>>> deftake1(s,a):
... min(s.find(i) for i in a if i in s)
... >>> import dis
>>> dis.dis(take1)
20 LOAD_GLOBAL 0 (min)
3 LOAD_CLOSURE 0 (s)
6 BUILD_TUPLE 19 LOAD_CONST 1 (<code object <genexpr> at 0x7fa622e961b0, file "<stdin>", line 2>)
12 MAKE_CLOSURE 015 LOAD_FAST 1 (a)
18 GET_ITER
19 CALL_FUNCTION 122 CALL_FUNCTION 125 POP_TOP
26 LOAD_CONST 0 (None)
29 RETURN_VALUE
>>> deftake2(s,a):
... next((i for i, ch inenumerate(s) if ch in a),None)
... >>> dis.dis(take2)
20 LOAD_GLOBAL 0 (next)
3 LOAD_CLOSURE 0 (a)
6 BUILD_TUPLE 19 LOAD_CONST 1 (<code object <genexpr> at 0x7fa622e96e30, file "<stdin>", line 2>)
12 MAKE_CLOSURE 015 LOAD_GLOBAL 1 (enumerate)
18 LOAD_FAST 0 (s)
21 CALL_FUNCTION 124 GET_ITER
25 CALL_FUNCTION 128 LOAD_CONST 0 (None)
31 CALL_FUNCTION 234 POP_TOP
35 LOAD_CONST 0 (None)
38 RETURN_VALUE
>>> deftake3(s,a):
... min(map(lambda x: (s.index(x) if (x in s) elselen(s)), a))
... >>> dis.dis(take3)
20 LOAD_GLOBAL 0 (min)
3 LOAD_GLOBAL 1 (map)
6 LOAD_CLOSURE 0 (s)
9 BUILD_TUPLE 112 LOAD_CONST 1 (<code object <lambda> at 0x7fa622e44eb0, file "<stdin>", line 2>)
15 MAKE_CLOSURE 018 LOAD_FAST 1 (a)
21 CALL_FUNCTION 224 CALL_FUNCTION 127 POP_TOP
28 LOAD_CONST 0 (None)
31 RETURN_VALUE
As you can clearly see in Padraic's case the loading of the global functions next
and enumerate
are the ones which are killing the time along with None
at the end. In alfasin's solution the primary slowing down is the lambda
function.
Solution 4:
Not as optimized as Padraic Cunningham's solution, but still a one liner:
string = "Hello world!"
specials = [' ', '!', 'x']
min(map(lambda x: (string.index(x) if (x in string) elselen(string)), specials))
Post a Comment for "How To Find The First Index Of Any Of A Set Of Characters In A String"